Subscribe: Eisler's NFS Blog
Added By: Feedage Forager Feedage Grade B rated
Language: English
client  create  data  file  jim  linux  netapp  nfs client  nfs  nfsv  paper  pnfs  presentation  server  storage  user 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Eisler's NFS Blog

Eisler's NFS Blog

Stuff about all things NFS.

Last Build Date: Thu, 18 Apr 2013 01:50:30 +0000


Red Hat Releases RHEL 6.2 with pNFS support

Tue, 06 Dec 2011 21:55:00 +0000

Hat Tip to Trond Mylkelbust, the Official NFS Client maintainer for Linux.

Today the NFS Industry received an early Christmas present from Red Hat.

Today Red Hat announced the availability of RHEL 6.2.

The RHEL 6.2 Features list from RedHat explicitly mentions pNFS (for the files layout only) as supported in RHEL 6.2.

This pNFS client capability complements NetApp's release last month of its Data ONTAP pNFS server.

Note that the Linux community has achieved a first here: this is the first time a Linux vendor of an enterprise-class distribution beat all other commericial operating systems to market with an NFS client. Historically, Linux was way behind commercial operating systems in delivering NFSv3 and NFSv4.0 (by 5 to 10 years). Note that this was acheived slightly less than 3 years after the NFSv4.1 and pNFS standards were ratified, and less than two years after those standards were published as RFC5661.

I want to extend my congratulations and thanks to Trond, Red Hat, the NetApp Linux NFS client engineering team, and Linux development community for the hard work of the past several years that went into this milestone.

NetApp has shipped its pNFS server

Mon, 21 Nov 2011 16:46:00 +0000

This week is Thanksgiving in the USA, and today the NFS industry has much to be thankful. Because today, Release Candidate 2 of Data ONTAP 8.1 was posted to NetApp's site and is available for download now. Release Candidate 2 introduces NetApp's pNFS server for Data ONTAP Cluster Mode, as well as the NFSv4.1 server necessary to enable pNFS functionality. This pNFS server supports the files-based layout type, aka LAYOUT4_NFSV4_1_FILES.

As described in Red Hat's documentation, the RHEL 6.2 beta release includes a tech preview of RedHat's upcoming pNFS client for Linux. You can also go the Fedora route.

Two questions that we often get about our pNFS server are:
  1. Is there is a single pNFS metadata server? Answer: no, every node in a Data ONTAP 8 Cluster Mode storage cluster is capable of being a metadata server. The NFSv4.1 client simply NFS mounts a volume via any node of the storage cluster, and that node acts as a metadata server.
  2. What happens when a node hosting a metadata server encounters a failure? The Data ONTAP 8 Cluster Mode system is designed to be fault tolerant if there are two or more nodes in the cluster. Another node will be assigned the network interfaces (essentially, IP addresses) of the failed node, and the NFSv4.1 client will re-connect to the new node, discovering that there has been a metadata server failure, and if necessary, obtain new layouts to any open files that were being accessed over pNFS.
Enjoy NetApp's pNFS server, and have a great Thanksgiving. and offline, material available at

Fri, 30 Sep 2011 20:08:00 +0000

I've received inquiries about (formerly going off line.

Via the wayback machine, Tom Haynes has restored hopefully all of the material, which is mostly presentations made at this valuable, but unfortunately now defunct, conference between the years 2000 and 2005, inclusive.

Tom, thank you, for providing this community service. I hope the time you spent qualifies under NetApp's volunteer benefit.

Will be blogging about NFS here for a while

Tue, 02 Aug 2011 21:54:00 +0000

NetApp is going in a different direction with corporate blogging, and until that gets resolved with respect to the NFS blog I used to post there, I will be posting here. If the blog location changes I again, I will delete this post, and replace it with one that links to the new spot.

I am working on getting the posts that used to be at back on line.

Thanks in advance for your understanding.

pNFS client is now part of Fedora 15

Tue, 02 Aug 2011 21:46:00 +0000

Trond Myklebust, the official Linux NFS client maintainer, told me today:

FYI: as of this morning, Fedora 15 is shipping with 'kernel-2.6.40' which is basically a renamed 3.0 kernel (presumably to avoid trouble with shell scripts that check for the '2.6.x').

The kernel is shipping with both 'files' and 'objects' pNFS modules.

VMware over NFS?

Tue, 11 Sep 2007 23:06:00 +0000

Nick Triantos of NetApp blogs about blocks-based storage protocols and the NetApp perspective. In as much as I am a public face of NFS zealotry for NetApp, Nick as assumed a similar role for NetApp's blocks protocols. I've been busy with the NFS track (more in another post coming up) at the SNIA conference this week, but just read Nick's blog post about using NFS over VMware, where he makes a strong case for using VMware over NFS.

There is also a presentation at VMworld from Peter Learmonth and Kim Weller of NetApp, and Bud James of BEA that delves more deeply into this notion. The presentation is password protected, but the web page that presented the link to the presentation also provided this password information:

user name: cbv_rep
password: cbvfor9v9r
If the link and password don't work, you can get to the presentation by:

Search for Learmonth in the Speaker Name field.


The actual presentation will be Wednesday, September 12, 2007 at the Moscone convention center in San Francisco.

NFSv4.1 at the SNIA Developers Conference

Fri, 31 Aug 2007 12:15:00 +0000

NetApp's PR department issued a press release telling the world I, and my co-editors, Dave Noveck of NetApp and Spencer Shepler of Sun, will be presenting the NFSv4.1 protocol at SNIA's Storage Developer's Conference on September 11, 2007.

That's my signal to finish up some slides. :-)

An NFSv4 ACL editor

Thu, 26 Jul 2007 16:00:00 +0000

Let's say you have to use NFSv3 but need Access Control Lists (ACLs). Let's say your NFSv3 server does not support one of many proprietary Draft POSIX ACL protocols, but your server does have NFSv4 support and NFSv4 ACLs. Let's also say that an NFSv4 ACL on your server is enforced on NFSv3 access. Is there a way to use NFSv4 ACLs without having an NFSv4 client?

Yes. The idea is to use a user-level NFSv4 client that implements enough of the NFS protocol to read and write NFSv4 ACLs.

A while back I wrote such a beast and it is available at:

It has been ported to Solaris and Linux.

The user interface isn't as nice as I'd like, nor does it support Kerberos V5 authentication. But rather than wait for such things to get done in my "ample spare time", I think it is worthwhile to make it more widely know this software exists. Feedback welcome. If this proves popular, I'll find time to add requested features and bug fixes.

NFSv4.1 Bakeathon and pNFS

Wed, 20 Jun 2007 02:19:00 +0000

Last week I was at Sun Microsystems' campus in Austin, Texas for the NFSv4.1 bakeathon, where various implementors tested NFSv4.1 against each other. The terms of Sun's confidentiality agreement don't allow me to provide details about companies and organizations that attended and how their code did. What I can say is that a total of 7 organizations, including NetApp, brought implementations to Austin, and all implementors had success with interoperability testing.NFSv4.1 has two big chunks of functionality: sessions and pNFS. Sessions is a new infrastructure that enables exactly once semantics and trunking. By "exactly once" we mean that NFSv4.1 will be able to guarantee that every operation is executed exactly once. This is important for "non-idempotent" operations: operations that if executed twice return different results, for example the file REMOVE operation. Overcoming non-idempotency is necessary for all filesystems, but it is a significant practical problem when the filesystem and the storage are separated by a potentially unreliable communications link as is the case with NFS. Because sessions is a large piece of infrastructure, several implementors in Austin focused on getting sessions to work.PNFS is parallel NFS: the striping of regular files across several data servers. NFSv4.1 entertains several types of data servers:Blocks-based, where the pNFS client accesses data via Fibre Channel or iSCSI.Object Storage-based, where the pNFS client accesses data via the OSD protocol.File-based, where the pNFS client accesses data via the NFSv4.1 protocol.Operations to create and delete files, and access directories are always done to a metadata server, regardless what type of data server is used to store regular files.At Austin, all three pNFS server/data server flavors were there.Recently Panasas had a press release or two on pNFS, and serveral articles were written. From my perspective, the Byte and Switch article is perhaps the most interesting one to use as fodder for the rest of this blog post, because it expresses opinions that are easy to take issue with."You could say NFS was invented by Bill Joy at Sun back in 1983, and the thing hasn't had a major performance upgrade in two decades,"Welll NFSv3 did add asynchronous I/O, and NFSv4.0 added delegations. In addition, NFS/RDMA adds significant performance wins. I consider pNFS to be yet another step in NFS performance improvements, and doubt it will be the last one either.When the IETF approves the new standard, which is anticipated by year's end, Panasas will have a significant first-mover advantage.Note that pNFS has three flavors of data servers, so this is not necessarily the case. Panasas is backing the OSD data server. Whereas, EMC and NetApp are backing the blocks and NAS-based data servers, respectively. Given the amount of storage that is accessible via blocks protocols and NFS, versus object protocols, I would expect some impedance in the market to pNFS over OSD, unless EMC, NetApp, and others have no story for moving on blocks and NFS-based data servers.The beauty of the files-based data server is that is uses the same protocol as that used to talk to the pNFS metadata server: NFSv4.1. Proponents of other data server protocols might come to appreciate this beauty, and wrap an NFSv4.1 front end onto their data servers.At least one analyst thinks enterprises don't really need pNFS to improve the performance of clustered systems. "All of the clustered file system NAS vendors have at some fundamental level data coming in over Ethernet that's served by different nodes," says Arun Taneja of the Taneja Group consultancy. "They all do it differently. Panasas does it in a very different way, and I'd call them the odd duck of the group." But Taneja acknowledges that, if large storage players get behind pNFS, the power of standardization could take [...]

NAS Conference Web Site

Thu, 07 Jun 2007 22:29:00 +0000

The NAS conference (aka the NFS Industry Conference) was a tradition Sun started in the 1980s during the beginnings of NFS but stop having by the early 1990s. In the mid 1990s it was brought back. In the 2000s it got much bigger and was expanded to include CIFS. A year or two ago Sun and SNIA agreed to give SNIA the conference. Anyway in the last month or so, the domain briefly expired, and so all the presentations from the 2000s were offline. After some whining on my part, I'm happy to report is back up.

A Database on NetApp Storage blog

Thu, 07 Jun 2007 22:10:00 +0000

Sanjay Gulabani is a performance engineer at NetApp who focuses on databases using NetApp storage. He's recently started a blog to discuss ideas and issues on this topic. I expect he'll write a great deal about using databases over NFS, and Oracle over NFS in particular.

NFSv4.1 Bakeathon in Austin next week

Thu, 07 Jun 2007 22:03:00 +0000

I'm going to Austin, TX next week to attend the NFSv4.1 interoperability testing event at Sun's facility. PNFS (parallel NFS) will be tested by several companies. I'll talk more about pNFS after the testing event. Feel free to post some questions now, and I'll follow up next week.

My reason for going is to get feed back on the NFSv4.1 draft specification which I've been editing along with Spencer Shepler of Sun, and Dave Noveck of NetApp. Back to my editing.

Data ONTAP GX paper summarized in ;login:

Thu, 07 Jun 2007 21:49:00 +0000

The June 2007 issue of ;login: has a summary from Avishay Traeger (of of the GX paper on I co-authored for the 2007 USENIX FAST Conference.

Storage Virtualization and why blogs beat traditional journalism

Fri, 27 Apr 2007 14:14:00 +0000

I was searching Google news for various key words, and imagine my astonishment went I came across:
NetApp VP says storage virtualization overrated
NetApp's VP of emerging products Jay Kidd on staying off the storage virtualization bandwagon, competition with Isilon and NetApp's current identity crisis.,289142,sid5_gci1252978,00.html - Apr 26, 2007 - Similar pages - Note this

What gives? Is my fellow employee (who is a smart articulate guy) disrespecting my pride and joy, ONTAP GX? If GX is not storage virtualization, then what is? Considering my previous blog post was on GX, I couldn't let this one slide.

If you read the article, you'll find an excellent conversation between the interviewer Beth Pariseau, and Jay Kidd, Senior VP of the Emerging Products Group at NetApp. No where does Jay say "storage virtualization [is] overrated". He does discuss file virtualization that NeoPath, Acopia, and Rainfinity do, and expresses his belief that that those businesses are not profitable.

File virtualization != Storage Virtualization. NetApp sells storage controllers, and products like V-Series and GX are examples storage virtualization at the storage controller level.

So why are blogs better than traditional journalism? Because bloggers get to pick their own headlines. I know Jay didn't pick NetApp VP says storage virtualization overrated as the headline, and doubt the interviewer, Beth Pariseau, did either.

Report from FAST 2007: Data ONTAP GX Paper

Wed, 21 Feb 2007 01:49:00 +0000

The night before the presentation, Peter Corbett, Dan Nydick, and I worked on the slides Peter was to present. Peter then fine tuned them, arrived exactly on time to present the slides (much to the relief of everyone involved). But the wait was worth it as Peter definitely improved the product (I later presented the paper to data storage class at a university in northern California. You can view that version of the slides [sans performance data, for now at least] on my personal web site).

At the FAST presentation, there were several questions, which I feverishly attempted to paraphrase. Here they are, with the answers given, and in some cases, my color commentary (in italics):

Q: Was a single file system used in the performance charts (given during the presentation)?

A: A single namespace, at least one volume per D-blade, was used.

Q: Why doesn't it scale beyond 24 nodes? What happens at 25?

A: We stopped at 24 because we achieved our initial one million operations/second goal. We believe it will scale beyond 24.

Q: What can limit scaling?

A: The replicated coherent database can potentially be a limiter.

Also, I think the other limiter can potentially be the cluster interconnect, but so far switch vendors can build devices more than capable of switching dozens to low hundreds of nodes.

Q: What benchmark is used for CIFS numbers?

A: Currently there is no standard CIFS benchmark, and we didn't prepare CIFS number for the presentation.

Also our CIFS benchmark numbers use aggregate read and write as NFS do, and will be similar. Note that SFS 4.0 will provide CIFS performance measurements.

Q: Why is write throughput half the read throughput?

A: READs are faster because the benchmark uses sequential I/O, and READs can benefit from read ahead.

Q: For the load balancing mirror feature, aren't you worried about writing multiple mirrors?

A: The load balancing mirrors are read-only. Only the master of a mirror family is writeable.

In the presentation slides I've posted, I've attempted to make this

You can read the paper at my personal website.

Data ONTAP GX paper at FAST 2007 this week

Mon, 12 Feb 2007 19:01:00 +0000

With Peter Corbett, Mike Kazar, Dan Nydick, and Chris Wagner, I submitted a paper on NetApp's Data ONTAP GX architecture and it was accepted for this week's FAST Conference. Peter is scheduled to present our paper this Thursday at 1:30 pm. (Apparently the venue is at the San Jose Marriott).


I'll follow up with a summary of audience questions and reactions.

Connectathon 2007

Mon, 12 Feb 2007 18:44:00 +0000

I'm trying (with mixed success) to travel less this year, and was going to skip Connectathon this year. However, I currently own the sessions portion of the NFSv4.1 spec, and several developers had issues and questions so I showed up for a few days. I didn't catch many presentations. Three presentations you might look at are Dave Noveck (one of my fellow NFSv4.1 specification editors) via his proxy Tom Talpey presented an excellent summary of new stuff in NFSv4.1 versus NFSv4.0. Ben Rockwood (of the cuddletech storage blog) discussed how he and his employer use NFS in what seems to be an OpenSolaris-only shop. Interestingly, Ben seems to be using bleeding edge OpenSolaris code which is a sharp contrast from my experience with how customers use Linux. Finally, Brent Callaghan of Apple discussed the NFS client and server changes in the upcoming Leopard release of Mac OSX. Brent's talk is a good reminder why a monoculture in the desktop computing space is bad thing, because Brent and his team produced a lot of interesting ideas and innovations. For example, Leopards adds Kerberized NFS support, joing Solaris, Linux, and AIX among the UNIX-like NFS clients, but rather than stick Keberos credentials in a ticket file, the tickets are kept per-user instance of the gssd daemon. BTW, Leopard will have a rudimentary NFSv4 client.

slides from my LISA 2007 presentation

Fri, 29 Dec 2006 23:05:00 +0000

My slides are available now at

I will be at the USENIX LISA Conference in D.C. this Thursday

Wed, 06 Dec 2006 05:13:00 +0000

I am scheduled to present on NFSv4 again this year at LISA, this Thursday in morning. I'll post slides sometime after.

Review of "Why NFS Sucks" Paper from the 2006 Linux Symposium

Fri, 27 Oct 2006 20:28:00 +0000

Olaf Kirch of SUSE/Novell, a major Linux distributor gave a talk on NFS July 26, 2006, at the Linux Symposium. There were some press reports of his presentation, which were stunning in their inaccuracies (e.g Sun invented RFS). In fairness, Olaf's paper has fewer errors, and I'll presume, since I wasn't there, that his presentation was no less accurate than his paper. Also according to first hand accounts of engineers I've exchanged email with, his presentation was far less critical of NFS than the paper. One attendee told me:The parts of his talk that I did hear, though, left me with the impression that NFSv4 is the best thing since sliced bread since it fixes all the nits and problems with NFSv2/v3.There were a few inaccuracies, but overall it was actually rather positive.Kirch's paper pokes lots of holes, some accurate, without always explaining why those holes are there, or how hard it would be to fill them. You might get more information understanding NFS warts by reading the original NFSv2 USENIX paper.The first section on History claims that AT&T's RFS predated NFS, and Sun designed NFS in reaction to weaknesses of RFS. That is reversed. Sun released NFS with SunOS 2.0 in 1985. RFS arrived in System V Release 3, which came in 1987. I was an employee of Lachman Associates, Inc. at the time, when Lachman obtained early access to System V Release 3 source code, and ported NFS from SunOS 2.0 to System V Release 3 during 1985 (Lachman also ported NFS to System V Release 2 in the same time frame). RFS was, if anything, a reaction to NFS, and is a classic example of the problems one will get if 100% adherence to POSIX semantics is the primary goal of a remote file access protocol. Kirch's explanation of the problems with RFS are correct, but later in his paper he criticizes NFS for not going down the same road.The paper claims that in the NFSv3 specification was written mostly by Rick Macklem, and published in 1995. RFC1813 , published indeed in 1995, documents the specification, but it was made available in a PostScript form by Sun in 1993. The primary contributors to the specification were Brian Pawlowski, Peter Staubach, Brent Callaghan, and Chet Juszczak (Chet being the catalyst for finally getting the NFS industry to sit down at the 1992 Connectathon and get serious about NFSv3). Rick certainly contributed to NFSv3 specification, but so did several others, and they are listed in the acknowledgements of RFC 1813. For what it is worth, Rick's contributions to NFSv3 out weighed mine.Regarding the claim that WebNFS gained no real following outside of Sun, I know of many NetApp customers that use it from Solaris clients to NetApp filers. Without NFSv4, it is the most practical way to use NFS through a firewall. It is certainly the case that web browsers unfortunately don't support nfs:// URLs, though I noticed Mac OS X uses nfs:// syntax for some applications. In the Linux world there's no WebNFS following, but that is a function of no support for it in Linux.Kirch states that the NFSv4 WG formed in reaction to Microsoft rebranding SMB as CIFS. Actually, the rebranding took place after Sun announced WebNFS. The Sun-hosted NFSv4 BOF at the 1996 San Jose IETF meeting took place after the Microsoft-hosted SMB BOF at the 1996 Montreal IETF meeting. I was at the SMB BOF in Montreal, and then co-chaired (with Brent Callaghan) the NFSv4 BOF at San Jose. Readers are free to connect the dots.In the section on NFS file handles, Kirch notes the difficulties the Linux dentry model poses for NFS. NFS was around for year[...]

OSDL's NFSv4 Press Release

Fri, 27 Oct 2006 18:32:00 +0000

I got a question about the implications about this excerpt from OSDL's NFSv4 press release:

The Open Source Development Labs (OSDL), the global consortium dedicated to accelerating the adoption of Linux® and open source software, today announced that the Network File System v4 (NFSv4) for Linux is available in Red Hat Enterprise Linux from Red Hat and SUSE Linux Enterprise from Novell. This milestone reflects the maturity of NFSv4 for Linux in the enterprise and coincides with Network Appliance’s latest donation of $100,000 to the NFSv4 testing community.

''NFS testing has been a key priority for OSDL and the Linux development community, and we have passed a significant milestone for it to be ready for enterprise validation,'' said Stuart Cohen, CEO of OSDL.
First, this is all good news, and it is consistent with the claims I've made last year at SNIA and LISA that, unlike the history with NFSv3, Linux is not lagging the industry on NFSv4. There are several commerical NFS vendors that are behind Linux in NFSv4 support.

Second, given the juxtaposition of "test", "significant milestone" , "Enterprise", and "Linux", a reasonable reader might conclude that OSDL is stating that Red Hat Enterprise Linux (RHEL) and SUSE Linux Enterprise (SLE) have passed all of OSDL's NFSv4 tests, and OSDL is stating NFSv4 on the current releases of those two distributions are enterprise ready.

I asked around and apparently OSDL did its testing in Linux kernel code from, and not RHEL or SLE. RHEL and SLE at the time this blog post was written did not have all the necessary NFSv4 updates. I'm told that RHEL and SLE will need several of updates from the mainline ( code before both distributions have an NFSv4 implementation that is "ready for enterprise validation."

XDR is now a full Standard!

Wed, 10 May 2006 01:28:00 +0000

Why am I using fixed width font? Because today the RFC editor published RFC 4506, the specification for XDR, and it is only fitting to use IETF's preferred character spacing to note this event. XDR is the data encoding standard for ONC RPC and NFS.

This is the culmination of a long process that started when RFC 1014, an informational RFC for XDR, was published in 1987. For me, the process started in 1997, when Bill Janssen and I submitted implementation reports showing that XDR qualified as a Draft Standard.

New IETF full Standards are rare beasts these days. RFC 4506 is assigned Standard number 67. Standard number 66 - RFC 3986 - was published January of last year.

Thanks to Bob Lyon for inventing XDR. And thanks to
Kevin Coffman, Benny Halevy, Jon Peterson, Peter Astrand and Bryan Olson for helping to cross the Ts and dot the Is on the final document.

NFSv3 Exclusive Create and NTFS Qtrees

Tue, 11 Apr 2006 21:43:00 +0000

A customer recently was having trouble using NTFS qtrees in Data ONTAP, when using NFSv3 or NFSv4 to gunzip some files. No such problem with NFSv2. It was narrowed down to the fact gunzip, or at least the gunzip being used by the customer, creates files with the exclusive create flag set.A file created by the open() system call with the O_EXCL flag present tells the kernel (UNIX or Linux), that if the specified file already exists, return an error, otherwise create the file. This allows applications that want to use lock files to work correctly. However, NFSv2 doesn't do anything special with exclusive create; its CREATE procedure is used for exclusive and non-exclusive create. If the file already exists, then NFSv2 CREATE just returns success from the NFSv2 server to the NFSv2 client. NFSv2 clients simulate the O_EXCL semantic by doing an over the network NFSv2 LOOKUP procedure to see if the file exists, and if it does, return an error to the process attempting the open(), otherwise, it issues the CREATE, and returns the result from the NFSv2 server for the CREATE (which will likely be success, barring permissions issues, out of space issues, or other issues). Clearly this isn't useful for creating lock files from multiple NFS clients because two clients could both find that a file doesn't not exist, and both issue the CREATE operation and both get success.Enter NFSv3 CREATE. The designers of NFSv3 (BTW, I'm a credited designer, but I can't take any credit for NFSv3 CREATE) produced a very clever yet simple algorithm for implemting exclusive create. Here are the arguments to NFSv3 CREATE: CREATE3res NFSPROC3_CREATE(CREATE3args) = 8;enum createmode3 {UNCHECKED = 0,GUARDED = 1,EXCLUSIVE = 2};union createhow3 switch (createmode3 mode) {case UNCHECKED:case GUARDED:sattr3 obj_attributes;case EXCLUSIVE:createverf3 verf;};struct CREATE3args {diropargs3 where;createhow3 how;};The key thing to understand is that if a non-exclusive create is done, the client provides an initial set of attributes, most likely consisting of the permission bits. However if an exclusive create is done, the client provides not attributes, but does offer a 64 it verifier. What happens in an exclusive CREATE is that the verifier is recorded in one of new file's attributes. If for some reason the client has to retry the request due to a timeout, or server re boot, the retry uses the same verifier. Because the verifier in the request matches what is stored in the file, the server returns success to the client, rather an NFS3ERR_EXIST error. If another client tries to do an exclusive CREATE around the same time, its verifier won't match what the server has recorded in the file, and so the other client gets NFS3ERR_EXIST. So now we have a perfect implementation of POSIX exclusive file create semantics. But we aren't quite done because the recall that the client didn't get set the desired permission bits. The NFSv3 protocol requires the "winner" of the exclusive create to follow up with an NFSv3 SETATTR operating to set all the attributes, including the mode bits.Here is where we get into trouble with NTFS qtrees in ONTAP. With an NTFS qtree, CIFS and CIFS alone owns the security attributes of a file. So when the NFSv3 client issues the SETATTR to set things like owner, group, and mode bits, ONTAP returns an error. This causes an error to be returned to the process on the NFSv3 client that issued the open() with the O_EXCL|O_CREAT flags.NFSv4 uses a[...]

Connectathon 2006

Sat, 11 Mar 2006 16:18:00 +0000

I was at Connectathon last week, and gave a presentation, "NFS over TCP, Again". The slides are now posted at the Connectathon web site. The material should hopefully be self-explanatory, but I'll annotate some it here based on the questions and discussions.Slide 3 asks "Why NFS/TCP?" In addition to the reasons I gave, Max Matveev pointed out that even though both TCP and UDP have the same weak 16 bit checksum algorithm (a topic discussed in more depth by Alok Aggarwal), it turns out NFS over UDP/IP is much more prone to data corruption than over TCP/IP. NFS needs to send requests and responses that exceed the Max Transmission Unit (MTU) of the network media used between the NFS client and server. TCP does this by breaking the NFS message into segments which will fit into the MTU. UDP does this by breaking the NFS message into IP fragments that each fit into the MTU. With TCP, each segment has a unique sequence number. With UDP, each fragment of a datagram shares a per-datagram 16 bit identifier, but has a unique fragment offset to indicate the fragment's place in the datagram. Let's say we are using NFS/UDP, and an NFS WRITE request is sent at time T, with datagram identifier X. The request is broken into N fragments. The first fragment is lost in the transmit somewhere, but the server receives the last N-1 fragments and holds them until it gets the first fragment, or the time to live (TTL) timer on each of the fragments expires.Meanwhile, the client is busy doing other NFS/UDP things, and the datagram identifier gets re-used. The identifier is just 16 bits; assuming 32 kilo byte writes, giga bit/sec transmission speeds, then 2^16 * 32 * 1024 * 8 / 1000^3 is just 17.2 seconds. If the TTL is greater than 17 seconds, then the re-use of the identifier for another 32 Kbyte NFS WRITE will result in the first fragment of the new NFS request being used as the first fragment of the old NFS request. That first fragment has some interesting stuff in it, such as the file handle, and the offset into the file. If the file handles are different, then we are writing data for one file into another file. That's a security hole and a data corruption. If the file handles are the same, and the offsets are different, then we get data corruption. If the file handles and offsets are the same, we can still get data corruption, because 17 seconds ago, a retry of the first NFS WRITE might have succeeded with no transmission loss, and this new NFS WRITE request is an intentional over write (say a database record update).I admit to never encountering the above myself, but I'd long since given up on NFS/UDP back when ethernet was a lot slower.Slide 4 says that in Linux, the NFS/UDP total timeout is about a minute. Someone, who shall go nameless to protect the guilty, challenged that. After the presentation, I did a quick experiment with a Linux client that is running:2.6.11-1.27_FC3 #1 Tue May 17 20:27:37 EDT 2005 i686 i686 i386 GNU/LinuxAt 16:07:39 I tried to do an ls of an NFSv3/UDP mount point to a dead NFS server, and collected a packet trace. The packet trace showed retransmissions at relative time (in seconds) offsets of 9.9, 19.8, 39.7, 1.1, 2.1, 4.3, 8.8, 16.6, 35.1, 1.1, 2.2 4.4, 8.8, etc. At 16:07:48 , the messages log wrote "server not responding".The inital timeout appears to be 10 seconds [not 100 milliseconds as I claimed in the slide], the overall "call" timeout is about a minute (9.9+19.8+39.7 = 69.4 secs ~= 16:07:[...]

Real Authentication in NFS

Fri, 24 Feb 2006 15:44:00 +0000

I get asked a lot about what can be done to prevent the following: nfs client% cd /home/jim /home/jim: Permission denied nfs client% ls -ld /home/jim drwx------ 2 jim grp1 117 Feb 24 07:48 /home/jim/ nfs client% su Password: nfs client# su jim nfs client% cd /home/jim /home/jimWhat is happening is that user jim has set the permissions on his data to 0700 meaning only he, the owner, should get access. But someone on the NFS client with knowledge of the super-user password can become root (user id 0), and then become jim and circumvent jim's protections. The reason why this works is that the NFS server is accepting AUTH_SYS credentials, which are basically, a user id, and 1 to 17 group ids. Simply su'ing to jim causes the NFS client in the kernel to pick up jim's user id and group ids.Some people have suggested if a more secure directory service like LDAP is used, especially if its configured to use Kerberos V5 authentication, that this is providing Kerberos authentication and so will defeat the attack. No, that is not the case. All that does is make sure the user using LDAP is authenticated via Kerberos (and the LDAP server is authenticated to the user via Kerberos). While this is a good thing, it does absolutely nothing to prevent the scenario above.The only thing today that prevents the scenario is to use Kerberos V5 (or some other strong authentication system, but Kerberos V5 is what most vendors have) authentication in the NFS traffic itself. This means exporting the volume with option sec=krb5 (or krb5i, or krb5p), and without anon=0 and without root=.What happens is that even if the attacker su'es to jim, unless he knows jim's Kerberos password, he cannot become user jim over the NFS connection. Even attempting to access /home/jim as super-user, even with Kerberos credentials for super-user, is defeated, because super-user, uid 0, will be mapped to user nobody (since anon=0 and root= are absent in the export options).Restricting access knowledge of the super-user password, while an excellent practice, is no panacea either. This is because synthetic, user-level NFS clients aren't rocket science to write, and they can be written so that any uid can be specified in the AUTH_SYS credential. There are "nfs shell" programs out there for anyone to download. While the one I've tried isn't written to allow arbitrary user ids to be inserted into the credentials of the NFS requests, it wouldn't be hard to change it.You might find the following links interesting:My presentation in 2003 at the NFS Industry Conference on NFS securityNetApp's TR on NFS security[...]