Tom Haynes on Exploring NFSv4

Disclaimer: I used to be a NFS engineer for NetApp.

Disclaimer: I'm am a NFS engineer for Sun.


Version RFC
NFSv4 rfc 3530
NFSv3 rfc 1813
NFSv2 rfc 1094

Some links:

NFSv4.org

Linux NFS

Learning NFSv4 with Fedora Core 2 (Linux 2.6.5 kernel)


Some NFSv4 offerings are brought to us via:

CITI and Open Reference

Sun

NetApp

IBM

Hummingbird

EMC


Table of Contents


Now being maintained at Kool Aid Served Daily

Here is my South Park avatar:

I think I got there via this site. Yep, it looks like the original author is Janina (AKA ZiB/Zwerg-im-Bikini) at www.planearium2.de.

My South Park persona. ;>

I took the geek test:

i am a major geek

I scored a 35.89744% on the test, so I am a Major Geek.


Sun Swag - 01/10/2006

I am back from Maui - I didn't drown.

I haven't worked on wont, it is still sucking on a WinXP license. I started with Sun and I do not yet have remote access. The current Nevada build bits are 12/05 and they won't install.

I thought FedEx delivered my replacement drive, NewEgg.com shows it is out for delivery. But no, it was Sun swag. Pretty nice and unexpected. I wonder how they select the size Tshirt you are going to get. I got a L and anyone who knows me would at least pick a XL if not a XXL.

I want to migrate this to a real blog offering. I might discontinue this altogether and start one up on Blogs.sun.com!. I could install the Roller Weblogger directly, but for right now my gateway box is running Fedora Core 4. And I don't like the complicated way to get Java running correctly on Core 4.


Pig Will and Pig Wont - 12/29/2005

No reply on how to fix the bug described in this OpenSolaris help thread. I RMA'ed the bad drive today and I hope to pick it up when I get back from Maui. I'll also be working for Sun then, so I'll have access to newer builds and hopefully the needed fix.


Pig Will and Pig Wont - 12/28/2005

Okay, I powered wont down last night and I've hit the nVidia problem with booting SATA drives.

The short of it is that Solaris won't boot - period - with the SATA drives. I ended up installing WinXp and reformatting the SATA drives. Perhaps it was extreme, but I need to get Solaris VTOCs on them and I had nothing invested in terms of data on the machine. By the way, MSI supplies two identically screened CDs with their motherboards - good luck finding out which one has the needed drivers. Plus you need DirectX 9 something or another to get their drivers installed.

That, along with flying a B2 RC plane for 10 minutes, has been the entire day. I'm hoping that the reinstall of Nevada goes smoothly.


Pig Will and Pig Wont - 12/27/2005

I've decided to call my new Opteron system wont, after some characters by Richard Scarry. I've had identical systems before, one which was easy and one which was hard. One became will and the other wont. Those systems have since been donated away. This new system is hard, so it becomes the new wont:

  • It has PCI-express and my plan to reuse an AGP card went awry.
  • The Antec case I bought did not have a speaker.
  • I bought a post card to see what was going on.
  • I got code 26, which was undocumented, but probably the keyboard, which was known to be good. So off went the MB back to NewEgg.com.
  • The next MB had the exact same problem, but this time, I asked myself if it was 26 or 2b? A quick google showed me it was probably a 2b.
  • So it had to be the CPU, right?
  • You can find Athlons by the boatload in Tulsa, but not Opterons.
  • By the way, I've been using a grounding strap. I got the crap zapped out of me getting out of my Silvarado today.
  • NewEgg.com does not do next morning and it was too late in the day to even get the next day.
  • Okay, what if it was the video card? Off went my tech support question to MSI. They suggested I contact the post card vendor to find out what 2b meant. At least they replied promptly.
  • CompUSA had a cheap, $29, PCI card available. I dropped that puppy in the available PCI slot and the system booted. Hooray! Now the real fun can begin.
  • Luckily, it is the end of the holiday season, so CompUSA took back the expensive PCI-express video card. (Note, the VisionTek XTASY Radeon X700 is not compatible with the K8N Master2-FAR.)
  • When I put the MB in the case and added the cheap PCI card, it blocked the firewire, floppy drive, and 2 out of 3 USB connectors. It is also a real tight fit and I'm not sure about the heat flow. Looks like I'll be looking for a compatible video card soon.
  • One of the SATA drives had its plastic pin support for the data connector yanked out when I was troubleshooting the last MB (which must have just been fine). So that is another RMA to NewEgg.com.
  • The serial cables for the CDROM and the 40G boot drive are some really streamlined cables for airflow. They also suck as far as staying connected. I've pulled some old cables from a dead Shuttle SS51G I had laying about.

Okay, Nevada B27 is busily loading on the machine. It sees all 3 of the SATA drives (Western Digital Caviar SE WD2500JS 250GB 7200 RPM SATA 3.0Gb/s).

I chose the MB because I scoured OpenSolaris.org and the BigAdmin HCL. I also visited some Sun Employee Blogs. Another good resource was The Blog of Ben Rockwood.

The CPU fan is loud, I don't have the full case on yet. That happens after the install.


Leaving NetApp - 12/29/2005

NetApp people would always ask me why I was going to Sun. I'd tell them to work on NFSv4 fulltime and also to get involved in OpenSolaris. But really, fortune cookies told me to go to Sun:

Concentrate on the good moments and ignore the bad.
You are soon going to change your present line of work.
You are offered the dream of a lifetime. Say yes!
Your determination will bring you much success.
Use your talents.
That's what they are intended for.

I've been collecting them in my wallet and they do chronoicle my journey from NetApp to Sun. I'll try and see if they follow suite once I actually start at Sun.


Leaving NetApp - 12/24/2005

I've officially left NetApp after 6 years. I'm leaving a great company and a lot of friends. But I know the people I'm going to be working with at Sun and they are a good bunch as well. I'm also very excited to be able to contribute to OpenSolaris.

I can't login to any of the NetApp boxes. I've shipped all of their gear I had back to them. I start with Sun on 01/09/06. Until then, I only get to play with my boxes.

I ordered a MSI K8N Master2-FAR Socket 940 motherboard from NewEgg.com and a bunch of disks to build a NAS box based on OpenSolaris, ZFS, and NFSv4. I've had to RMA it. I really liked the MB, the Thermaltake PurePower 680W, and the Antec P180 Advanced Super Mid Tower. But, the CPU fans in the K8N are loud, I'll have to see how it sounds with the sides on. The case fans sound quiet. By the way, the old trick of stopping a fan with your thumb to isolate the loudness does not work on modern CPU fans - I broke a blade and tore a chunk into my thumb.

I also did not like that the Antec P180 did not come with a case speaker. I understand it is for servers and servers do not normally ship with sound. But, until I bought a post board, I had no clue as to which component was iced up. I was getting a post of 26 on the Phoenix BIOS. I swapped keyboards, and since I knew both were good, I ended up RMAing my MB.

I like NewEgg.com, I find it amusing that they actually advertize in the San Jose Airport. I thought buying the Antec locally would be a save, especially with shipping, but no, I ended up paying more at CompUSA. The service at CompUSA has been bad this week. An exchange on a PCI express graphics card went smoothly, but getting help to get anything back in the shelves area was like pulling teeth. As a matter of fact, the only think CompUSA did beat NewEgg.com on was the graphics card.

Until I get the new MB, I've installed Nevada on my Shuttle SS51G (mrx). Except for not installing from the USB-CDROM, it went pretty smoothly. The graphics are shifted from when the box was running Fedora Core 4, but I can work that out. I've got Nevada running on the Ultra 5 (sandman) and the Dell Dimension 4600 (will). Note on the Dell, you have to add in entries for the onboard ethernet:

add to /etc/driver_alias:
iprb "pci8086,1050"

devfsadm -v
ifconfig -a plumb
/usr/sbin/sys-unconfig
reboot

I also have a PATA 120G Maxtor drive in it. Getting it up and running has really changed since Solaris 10. In Solaris 10, the following would work:

vi /boot/solaris/devicedb/master  and add:
pci1095,680 pci-ide msd pci ata.bef "Silicon Image 680 PATA Controller"

touch /reconfigure
reboot

Because of grub, that no longer works in Nevada. I tried the following first:

vi /boot/grub/menu.lst

change:
kernel /platform/i86pc/multiboot
to:
kernel /platform/i86pc/multiboot -B "pci-ide=pci1095,680"

And that didn't work. I had to remove it and do:

eeprom pci-ide=pci1095,680

And then I think the moon passed the old man on the scooter and it worked. Shrug

I added the Maxtor in order to play with ZFS. Note that right now Nevada (B27) does not allow you to have ZFS for your boot partitions.

To enable ZFS, I had to first go into format and partition to blow away the old UFS offerings I had there.

A lot of work considering I'm going to dismantle the Dell and give it to my sister. Her system had a nasty accident with a some hair in the power supply.


Getting started with NFSv4 - 11/10/2005


Solaris 10 Server - 11/10/2005

I've got two systems to play with for running NFSv4. They are both freshly installed, one running Fedora Core 4 and the other running Solaris 10:

# uname -a
SunOS will 5.10 Generic i86pc i386 i86pc
[nfsv4@mrx ~]$ uname -a
Linux mrx.excfb.com 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 i686 i386 GNU/Linux

I'll bring in other systems as needed.

I've created 3 different users for testing:

# useradd -u 1094 -g 100 -m -d /export/home/nfsv2 nfsv2
# useradd -u 1813 -g 100 -m -d /export/home/nfsv3 nfsv3
# useradd -u 3530 -g 100 -m -d /export/home/nfsv4 nfsv4

For right now, all of them live locally to each system.

I've got DNS configured for the domain: excfb.com, but no other name services.

The first thing to do is to create an export on will:

# share -F nfs -o sec=sys,rw,anon=0 -d "Cat Scratch Fever" /loghyr

Note that the sec=sys has to appear before the other options. I can confirm that the export is there via:

# share
-               /loghyr   sec=sys,rw,anon=0   "Cat Scratch Fever"

Also, if I want to make sure this export works across reboots, I need to add it to /etc/dfs/dfstab. Finally, I have to make sure to enable NFS on will:

# svcadm enable svc:/network/nfs/server

Okay, can mrx mount that export? First we need to make sure will is exporting it, really:

[root@mrx ~]# showmount -e will
Export list for will:
/loghyr (everyone)

Now we need to mount that export:

[root@mrx ~]# mount will:/loghyr /nfs/will/loghyr

Did it work?

[nfsv4@mrx ~]$ mount | grep will
will:/loghyr on /nfs/will/loghyr type nfs (rw,addr=192.168.2.103)

Yes and no. Yes, the mount worked, but we have an NFSv3 mount and not an NFSv4 mount. How can we confirm that? With snoop!

From the client:

[nfsv4@mrx ~]$ ls -la /nfs/will/loghyr
total 63
drwxr-xr-x  10 root root    512 Nov 10 11:02 .
drwxr-xr-x   3 root root   4096 Nov  9 19:58 ..
drwxr-xr-x   3 root root    512 Nov  9 16:37 fedora
drwxr-xr-x  11 root root    512 Apr  4  2005 home
drwxr-xr-x   3 root root    512 Mar 12  2005 html
-rw-r--r--   1 root root  34654 Nov  9 17:10 httpd.conf
drwxr-xr-x   2 tdh  wheel   512 Nov 10 00:58 isos
drwx------   2 root root   8192 Nov  7 21:27 lost+found
drwxr-x---   4 root named   512 Dec  2  2004 named
-rw-r--r--   1 root root   8793 Nov  9 02:24 readme.txt
drwxr-xr-x   3 root root    512 Nov 10 10:58 Solaris10
drwxr-xr-x   2 root root    512 Nov 10 00:47 spool

Yields: Note that I stripped out some TCP fragments for clarity

# snoop will mrx
Using device /dev/iprb0 (promiscuous mode)
mrx.excfb.com -> will         NFS C GETATTR3 FH=DA01
        will -> mrx.excfb.com NFS R GETATTR3 OK
mrx.excfb.com -> will         NFS C ACCESS3 FH=DA01 (read,lookup,modify,extend,delete)
        will -> mrx.excfb.com NFS R ACCESS3 OK (read,lookup)
mrx.excfb.com -> will         NFS C READDIRPLUS3 FH=DA01 Cookie=0 for 512/4096
        will -> mrx.excfb.com NFS R READDIRPLUS3 OK 9+ entries (incomplete)
mrx.excfb.com -> will         NFS C GETATTR3 FH=DA01
        will -> mrx.excfb.com NFS R GETATTR3 OK

Time to do the same experiment from another Solaris 10 client:

# uname -a
SunOS ultralord 5.10 Generic sun4u sparc SUNW,Ultra-5_10
# mount will:/loghyr /nfs/will/loghyr
# mount | grep will
/nfs/will/loghyr on will:/loghyr remote/read/write/setuid/devices/xattr/dev=46c0003 on Thu Nov 10 14:13:58 2005
# ls -la /nfs/will/loghyr
total 120
drwxr-xr-x  10 root     root         512 Nov 10 11:02 .
drwxr-xr-x   3 root     root         512 Nov  8 23:06 ..
drwxr-xr-x   3 root     root         512 Nov 10 10:58 Solaris10
drwxr-xr-x   3 root     root         512 Nov  9 16:37 fedora
drwxr-xr-x  11 root     root         512 Apr  4  2005 home
drwxr-xr-x   3 root     root         512 Mar 12  2005 html
-rw-r--r--   1 root     root       34654 Nov  9 17:10 httpd.conf
drwxr-xr-x   2 tdh      staff        512 Nov 10 00:58 isos
drwx------   2 root     root        8192 Nov  7 21:27 lost+found
drwxr-x---   4 root     smmsp        512 Dec  2  2004 named
-rw-r--r--   1 root     root        8793 Nov  9 02:24 readme.txt
drwxr-xr-x   2 root     root         512 Nov 10 00:47 spool

Note that from the mount output, we do not have a clue as to which NFS version we are using here. We go back to the server and use snoop:

# snoop will ultralord
Using device /dev/iprb0 (promiscuous mode)
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=3881 GETATTR 10011a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=3881 GETATTR 10111a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (access      ) PUTFH FH=3881 ACCESS rd,lk,mo,ext,dl GETATTR 10011a b0a23a
        will -> ultralord.excfb.com NFS R 4 (access      ) NFS4_OK PUTFH NFS4_OK ACCESS NFS4_OK Supp=rd,lk,mo,ext,dl
                                                           Allow=rd,lk,mo,ext,dl GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=3881 GETATTR 10011a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (readdir     ) PUTFH FH=3881 READDIR Cookie=0 (0000000000000000) for
                                                           8192/1048576
        will -> ultralord.excfb.com NFS R 4 (readdir     ) NFS4_OK PUTFH NFS4_OK 
        will -> ultralord.excfb.com RPC R (#0) XID=0 Program number mismatch (low=1575194, high=11575866)
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=4109 GETATTR 10111a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=43B5 GETATTR 10111a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=48FC GETATTR 10111a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=65F6 GETATTR 10111a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=4874 GETATTR 10111a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=5D5C GETATTR 10111a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=603B GETATTR 10111a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=6382 GETATTR 10111a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=395A GETATTR 10111a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
ultralord.excfb.com -> will         NFS C 4 (getattr     ) PUTFH FH=5269 GETATTR 10111a b0a23a
        will -> ultralord.excfb.com NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK

So we see NFSv4 traffic. Why the difference in behaviour between Linux and Solaris?

The Solaris NFS server and client are by default configured to negotiate the version of NFS to use. And it will try to use the highest available version first. This can be changed in /etc/default/nfs:

# Sets the minimum version of the NFS protocol that will be registered
# and offered by the server.  The default is 2.
#NFS_SERVER_VERSMIN=2

# Sets the maximum version of the NFS protocol that will be registered
# and offered by the server.  The default is 4.
#NFS_SERVER_VERSMAX=4

# Sets the minimum version of the NFS protocol that will be used by
# the NFS client.  Can be overridden by the "vers=" NFS mount option.
# The default is 2.
#NFS_CLIENT_VERSMIN=2

# Sets the maximum version of the NFS protocol that will be used by
# the NFS client.  Can be overridden by the "vers=" NFS mount option.
# If "vers=" is not specified for an NFS mount, this is the version
# that will be attempted first.  The default is 4.
#NFS_CLIENT_VERSMAX=4

# Determines if the NFS version 4 delegation feature will be enabled
# for the server.  If it is enabled, the server will attempt to
# provide delegations to the NFS version 4 client. The default is on.
#NFS_SERVER_DELEGATION=on

# Specifies to nfsmapid daemon that it is to override its default
# behavior of using the DNS domain, and that it is to use 'domain' as
# the domain to append to outbound attribute strings, and that it is to
# use 'domain' to compare against inbound attribute strings.
#NFSMAPID_DOMAIN=domain

The Linux community decided to not follow this same model.

In any event, NFSv4 is treated as a different vfstype than either NFSv2 or NFSv3. Both of those protocols are the vfstype of nfs, whilst NFSv4 is the vfstype of nfs4. We can see this with the example:

[root@mrx ~]# mount -t nfs4 will:/loghyr /nfs4/will/loghyr
[root@mrx ~]# mount | grep will
will:/loghyr on /nfs/will/loghyr type nfs (rw,addr=192.168.2.103)
will:/loghyr on /nfs4/will/loghyr type nfs4 (rw,addr=192.168.2.103)
[root@mrx ~]# ls -la /nfs4/will/loghyr
total 63
drwxr-xr-x  10 nobody nobody   512 Nov 10 11:02 .
drwxr-xr-x   3 root   root    4096 Nov  9 21:53 ..
drwxr-xr-x   3 nobody nobody   512 Nov  9 16:37 fedora
drwxr-xr-x  11 nobody nobody   512 Apr  4  2005 home
drwxr-xr-x   3 nobody nobody   512 Mar 12  2005 html
-rw-r--r--   1 nobody nobody 34654 Nov  9 17:10 httpd.conf
drwxr-xr-x   2 nobody nobody   512 Nov 10 00:58 isos
drwx------   2 nobody nobody  8192 Nov  7 21:27 lost+found
drwxr-x---   4 nobody nobody   512 Dec  2  2004 named
-rw-r--r--   1 nobody nobody  8793 Nov  9 02:24 readme.txt
drwxr-xr-x   3 nobody nobody   512 Nov 10 10:58 Solaris10
drwxr-xr-x   2 nobody nobody   512 Nov 10 00:47 spool

And snoop shows:

# snoop will mrx
Using device /dev/iprb0 (promiscuous mode)
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 ACCESS rd,lk,mo,ext,dl
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK ACCESS NFS4_OK Supp=rd,lk,mo,ext,dl Allow=rd,lk,mo,ext,dl
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 READDIR Cookie=0 (0000000000000000) for 2008/4016
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK READDIR NFS4_OK 10 entries (No more)
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 LOOKUP lost+found GETFH GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=4109 GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 LOOKUP html GETFH GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=43B5 GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 LOOKUP home GETFH GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=48FC GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 LOOKUP fedora GETFH GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=65F6 GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 LOOKUP httpd.conf GETFH GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=4874 GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 LOOKUP isos GETFH GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=5D5C GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 LOOKUP Solaris10 GETFH GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=603B GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 LOOKUP named GETFH GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=6382 GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 LOOKUP readme.txt GETFH GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=395A GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () PUTFH FH=3881 LOOKUP spool GETFH GETATTR 10011a 30a23a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=5269 GETATTR NFS4_OK
mrx.excfb.com -> will         NFS C 4 () RENEW CL=654373a60a
        will -> mrx.excfb.com NFS R 4 () NFS4_OK RENEW NFS4_OK

Now the big question is why is the user and group for nobody showing up in the listing from the NFSv4 listing on will?

The answer lies in that in NFSv4, ownership is given by a string and not a numeric. If we were to delve a bit deeper into the snoop trace (gotten via snoop -x0,2000 -o /tmp/snoop will mrx), we would see:

In the RPC layer of the request:

RPC:  ----- SUN RPC Header -----
RPC:
RPC:  Record Mark: last fragment, length = 176
RPC:  Transaction id = 681346145
RPC:  Type = 0 (Call)
RPC:  RPC version = 2
RPC:  Program = 100003 (NFS), version = 4, procedure = 1
RPC:  Credentials: Flavor = 1 (Unix), len = 64 bytes
RPC:     Time = 5906
RPC:     Hostname = mrx.excfb.com
RPC:     Uid = 0, Gid = 0
RPC:     Groups = 0 1 2 3 4 6 10
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes
RPC:

In the NFS layer of the reply:

NFS:  Op = 9 (GETATTR)
NFS:  Status = 0 (NFS4_OK)
NFS:    0x1a   SIZE CHANGE TYPE
NFS:    0x01   FSID
NFS:    0x10   FILEID
NFS:    0x00
NFS:    0x3a   OWNER_GROUP OWNER NUMLINKS MODE
NFS:    0xa2   TIME_ACCESS SPACE_USED RAWDEV
NFS:    0x30   TIME_MODIFY TIME_METADATA
NFS:    0x00
NFS:  Type = DIR
NFS:  Change ID = 0x43737d08080ada68
NFS:  Size = 512
NFS:  FS ID: Major = 66, Minor = 40
NFS:  File ID = 18624
NFS:  Mode = 0755
NFS:  Number of Links = 2
NFS:  Owner = tdh@excfb.com
NFS:  Group = staff@excfb.com
NFS:  Raw Device ID = 0, 0
NFS:  Space Used (this object) = 1024
NFS:  Last Access Time = 10-Nov-05 06:56:41.629919000 GMT
NFS:  Last Metadata Change Time = 10-Nov-05 17:02:00.134929000 GMT
NFS:  Last Modification Time = 10-Nov-05 06:58:28.590484000 GMT

The server has to take the uid and gid from the RPC layer of the request and map them to identify permissions. But when it sends back the Owner and Group in the GETATTR, it sends back a string. The client is responsible for mapping these back to uid and gid (or to strings in its domain.).

Why did ultralord do this correctly earlier and will not? Again, the answer lies in the defaults selected by the implementors. Sun decided that their default ID domain would be prompted for during installation and defaults to the DNS domain. Linux decided to make the default always be localdomain.

We can modify Sun's ID domain by editing /etc/default/nfs:

# Specifies to nfsmapid daemon that it is to override its default
# behavior of using the DNS domain, and that it is to use 'domain' as
# the domain to append to outbound attribute strings, and that it is to
# use 'domain' to compare against inbound attribute strings.
#NFSMAPID_DOMAIN=domain

After we make a change, we can either reboot or have nfsmapid restarted:

# svcadm restart svc:/network/nfs/mapid

We can modify Linux's ID domain by editing /etc/idmapd.conf:

Domain = localdomain

After we make a change, we can either reboot or have idmapd restarted:

[root@mrx ~]# vi /etc/idmapd.conf
...
Domain = excfb.com
...
[root@mrx ~]# service rpcidmapd restart
Shutting down RPC idmapd:                                  [  OK  ]
Starting RPC idmapd:                                       [  OK  ]

Okay, let's check to see how that changes the results:

[root@mrx ~]# ls -la /nfs4/will/loghyr
total 63
drwxr-xr-x  10 nobody nobody   512 Nov 10 11:02 .
drwxr-xr-x   3 root   root    4096 Nov  9 21:53 ..
drwxr-xr-x   3 nobody nobody   512 Nov  9 16:37 fedora
drwxr-xr-x  11 nobody nobody   512 Apr  4  2005 home
drwxr-xr-x   3 nobody nobody   512 Mar 12  2005 html
-rw-r--r--   1 nobody nobody 34654 Nov  9 17:10 httpd.conf
drwxr-xr-x   2 nobody nobody   512 Nov 10 00:58 isos
drwx------   2 nobody nobody  8192 Nov  7 21:27 lost+found
drwxr-x---   4 nobody nobody   512 Dec  2  2004 named
-rw-r--r--   1 nobody nobody  8793 Nov  9 02:24 readme.txt
drwxr-xr-x   3 nobody nobody   512 Nov 10 10:58 Solaris10
drwxr-xr-x   2 nobody nobody   512 Nov 10 00:47 spool
[root@mrx ~]# umount /nfs4/will/loghyr
[root@mrx ~]# mount -t nfs4 will:/loghyr /nfs4/will/loghyr
[root@mrx ~]# ls -la /nfs4/will/loghyr
total 63
drwxr-xr-x  10 root root     512 Nov 10 11:02 .
drwxr-xr-x   3 root root    4096 Nov  9 21:53 ..
drwxr-xr-x   3 root root     512 Nov  9 16:37 fedora
drwxr-xr-x  11 root root     512 Apr  4  2005 home
drwxr-xr-x   3 root root     512 Mar 12  2005 html
-rw-r--r--   1 root root   34654 Nov  9 17:10 httpd.conf
drwxr-xr-x   2 tdh  nobody   512 Nov 10 00:58 isos
drwx------   2 root root    8192 Nov  7 21:27 lost+found
drwxr-x---   4 root smmsp    512 Dec  2  2004 named
-rw-r--r--   1 root root    8793 Nov  9 02:24 readme.txt
drwxr-xr-x   3 root root     512 Nov 10 10:58 Solaris10
drwxr-xr-x   2 root root     512 Nov 10 00:47 spool

Evidently some client side caching was going on. A remount cleared that up. Are we happy now? We still see a nobody as a group instead of staff.

The answer lies in the fact that we are not using a name service and relying on local mappings. On Solaris:

# grep staff /etc/group
staff::10:

On Linux:

[root@mrx ~]# grep staff /etc/group
[root@mrx ~]# grep ":10:" /etc/group
wheel:x:10:root

So, under NFSv3, we would see:

drwxr-xr-x   2 tdh  wheel   512 Nov 10 00:58 isos

Because we would get a gid back instead of staff@excfb.com. But under NFSv4, we can not map staff to wheel.

This is not a difference between Linux and Solaris. We can add gid 1066 to both of the Solaris boxes:

Client

# uname -a
SunOS ultralord 5.10 Generic sun4u sparc SUNW,Ultra-5_10
# echo "battle::1066:" >>  /etc/group

Server

# uname -a
SunOS will 5.10 Generic i86pc i386 i86pc
# echo "hastings::1066:" >>  /etc/group
# touch /loghyr/norman
# chgrp 1066 /loghyr/norman
# ls -la /loghyr/norman
-rw-r--r--   1 root     hastings       0 Nov 10 15:13 /loghyr/norman

Now, what does ultralord display?

# ls -la /nfs/will/loghyr/norman
-rw-r--r--   1 root     nobody         0 Nov 10 15:13 /nfs/will/loghyr/norman

So, if there does not exist a mapping, then the string nobody is displayed.


Linux 2/6 Server - 11/10/2005

Now we add an export to mrx and see how Solaris likes it.

[root@mrx ~]# cat /etc/exports
/loghyr *(rw,fsid=0,insecure,no_subtree_check,sync)
[root@mrx ~]# exportfs -r

What does will think?

# showmount -e mrx
showmount: mrx: RPC: Program not registered

Okay, lets start up the NFS services:

[root@mrx ~]# chkconfig --list | grep nfs
nfs             0:off   1:off   2:off   3:off   4:off   5:off   6:off
nfslock         0:off   1:off   2:off   3:on    4:on    5:on    6:off
[root@mrx ~]# chkconfig nfs on
[root@mrx ~]# chkconfig --list | grep nfs
nfs             0:off   1:off   2:on    3:on    4:on    5:on    6:off
nfslock         0:off   1:off   2:off   3:on    4:on    5:on    6:off
[root@mrx ~]# service nfs start
Starting NFS services:                                     [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]

Now, what does will think?

# showmount -e mrx
export list for mrx:
/loghyr *

Great, lets mount it!

# mount mrx:/loghyr  /nfs4/mrx/loghyr
nfs mount: mrx:/loghyr: No such file or directory

Okay, can we at least mount it via NFSv3?

# mount -o vers=3 mrx:/loghyr  /nfs4/mrx/loghyr

Does snoop show anything interesting?

will:> snoop -i /tmp/snoop | grep NFS
  4   0.00029         will -> mrx.excfb.com NFS C NULL4
  6   0.00004 mrx.excfb.com -> will         NFS R NULL4
 14   0.02677         will -> mrx.excfb.com NFS C 4 (secinfo     ) PUTROOTFH SECINFO loghyr
 16   0.00006 mrx.excfb.com -> will         NFS R 4 (secinfo     ) NFS4ERR_OP_ILLEGAL PUTROOTFH NFS4_OK
                                                                     ILLEGAL NFS4ERR_OP_ILLEGAL
 18   0.00004         will -> mrx.excfb.com NFS C 4 (mount       ) PUTROOTFH GETFH LOOKUP loghyr GETFH
                                                                     GETATTR c8000167 0
 19   0.00013 mrx.excfb.com -> will         NFS R 4 (mount       ) NFS4ERR_NOENT PUTROOTFH NFS4_OK GETFH
                                                                     NFS4_OK FH=0015 LOOKUP NFS4ERR_NOENT
 21   0.00003         will -> mrx.excfb.com NFS C 4 (secinfo     ) PUTROOTFH SECINFO loghyr
 22   0.00011 mrx.excfb.com -> will         NFS R 4 (secinfo     ) NFS4ERR_OP_ILLEGAL PUTROOTFH NFS4_OK
                                                                     ILLEGAL NFS4ERR_OP_ILLEGAL
 23   0.00010         will -> mrx.excfb.com NFS C 4 (mount       ) PUTROOTFH GETFH LOOKUP loghyr GETFH
                                                                     GETATTR c8000167 0
 24   0.00012 mrx.excfb.com -> will         NFS R 4 (mount       ) NFS4ERR_NOENT PUTROOTFH NFS4_OK GETFH
                                                                     NFS4_OK FH=0015 LOOKUP NFS4ERR_NOENT
 25   0.00003         will -> mrx.excfb.com NFS C 4 (secinfo     ) PUTROOTFH SECINFO loghyr
 26   0.00011 mrx.excfb.com -> will         NFS R 4 (secinfo     ) NFS4ERR_OP_ILLEGAL PUTROOTFH NFS4_OK
                                                                     ILLEGAL NFS4ERR_OP_ILLEGAL
 27   0.00025         will -> mrx.excfb.com NFS C 4 (mount       ) PUTROOTFH GETFH LOOKUP loghyr GETFH
                                                                     GETATTR c8000167 0
 28   0.00012 mrx.excfb.com -> will         NFS R 4 (mount       ) NFS4ERR_NOENT PUTROOTFH NFS4_OK GETFH
                                                                     NFS4_OK FH=0015 LOOKUP NFS4ERR_NOENT

According to a stale CITI page on using-nfsv4, we need to setup a /export directory and construct a pseudofilesystem (pseudofs) underneath it. According to Learning NFSv4 with Fedora Core 2 (Linux 2.6.5 kernel), we denote the export as belonging to the pseudofs by using the fsid=0 option - which I had utilized. What am I doing wrong here?

I'm not accessing the correct psuedofs root. Watch as I try to mount the export on mrx

[root@mrx ~]# mount -t nfs4 mrx:/loghyr /nfs4/mrx/loghyr
mount: special device mrx:/loghyr does not exist

The error doesn't state that I'm sending an illegal operation, it says I can't access the special device. When in doubt when mounting from an NFSv4 server, start with the root:

[root@mrx ~]# mount -t nfs4 mrx:/ /nfs4/mrx/loghyr
[root@mrx ~]# ls -la /nfs4/mrx/loghyr
total 8
drwxr-xr-x  2 nobody nobody 4096 Nov 10 17:11 .
drwxr-xr-x  3 root   root   4096 Nov 10 17:38 ..
[root@mrx ~]# touch /loghyr/it
[root@mrx ~]# ls -la /nfs4/mrx/loghyr
total 8
drwxr-xr-x  2 nobody nobody 4096 Nov 10 17:11 .
drwxr-xr-x  3 root   root   4096 Nov 10 17:38 ..

Blush, if I had bothered to use dmesg, I might have found:

NFS: mount path /loghyr does not exist!
NFS: suggestion: try mounting '/' instead.

Still not right yet. But for right now, what does will think about this change?

# mount mrx:/ /nfs4/mrx/loghyr
# ls -la /nfs4/mrx/loghyr
total 10
drwxr-xr-x   2 root     root        4096 Nov 10 17:47 .
drwxr-xr-x   3 root     root         512 Nov 10 17:20 ..
-rw-r--r--   1 root     root           0 Nov 10 17:47 it

Hey, I didn't expect that will would see it! Lets go back and check mrx:

[root@mrx ~]# ls -la /nfs4/mrx/loghyr
total 8
drwxr-xr-x  2 nobody nobody 4096 Nov 10 17:47 .
drwxr-xr-x  3 root   root   4096 Nov 10 17:38 ..
-rw-r--r--  1 nobody nobody    0 Nov 10 17:47 it

Some more client caching?

Hmm, okay, why does will do the idmapping correctly and mrx not? Is it because the uid is for root?

If we set the owner to a non-root account, what happens?

[root@mrx ~]# chown tdh:100 /loghyr/it
[root@mrx ~]# ls -la /nfs4/mrx/loghyr
total 8
drwxr-xr-x  2 nobody nobody 4096 Nov 10 17:47 .
drwxr-xr-x  3 root   root   4096 Nov 10 17:38 ..
-rw-r--r--  1 nobody nobody    0 Nov 10 17:47 it

Okay, from the Solaris client, what do we see? Also, lets create a file from here.

# touch /nfs4/mrx/loghyr/the_sky
# ls -la /nfs4/mrx/loghyr
total 10
drwxrwxrwx   2 root     root        4096 Nov 10 17:56 .
drwxr-xr-x   3 root     root         512 Nov 10 17:20 ..
-rw-r--r--   1 tdh      users          0 Nov 10 17:47 it
-rw-r--r--   1 nobody   nobody         0 Nov 10 17:56 the_sky

Can the Linux client (Okay, I don't have a second Linux box available to me.) see the changes made by will?

[root@mrx ~]# ls -la /nfs4/mrx/loghyr
total 8
drwxrwxrwx  2 nobody nobody 4096 Nov 10 17:56 .
drwxr-xr-x  3 root   root   4096 Nov 10 17:38 ..
-rw-r--r--  1 nobody nobody    0 Nov 10 17:47 it
-rw-r--r--  1 nobody nobody    0 Nov 10 17:56 the_sky
[root@mrx ~]# ls -la /loghyr
total 12
drwxrwxrwx   2 root       root       4096 Nov 10 17:56 .
drwxr-xr-x  28 root       root       4096 Nov 10 17:11 ..
-rw-r--r--   1 tdh        users         0 Nov 10 17:47 it
-rw-r--r--   1 4294967294 4294967294    0 Nov 10 17:56 the_sky

The view from the local filesystem is not so nice. Where did it get that large number?

11/12/05 - The value is -2 and is from the default set by server.

Can we change the ownership from the client?

# chown tdh:100 /nfs4/mrx/loghyr/the_sky
^C^C^C^C#

No. How about from the server via the local filesystem?

[root@mrx ~]# chown tdh:100 /loghyr/the_sky
[root@mrx ~]# ls -la /nfs4/mrx/loghyr
total 8
drwxrwxrwx  2 nobody nobody 4096 Nov 10 17:56 .
drwxr-xr-x  3 root   root   4096 Nov 10 17:38 ..
-rw-r--r--  1 nobody nobody    0 Nov 10 17:47 it
-rw-r--r--  1 nobody nobody    0 Nov 10 17:56 the_sky

Still not able to see the real uid and it is going across the loopback:

[root@mrx ~]# mount | grep mrx
mrx:/ on /nfs4/mrx/loghyr type nfs4 (rw,addr=127.0.0.1)

Can the Solaris client see the new uids?

^C^C^C^C# ls -la /nfs4/mrx/loghyr
total 10
drwxrwxrwx   2 root     root        4096 Nov 10 17:56 .
drwxr-xr-x   3 root     root         512 Nov 10 17:20 ..
-rw-r--r--   1 tdh      users          0 Nov 10 17:47 it
-rw-r--r--   1 tdh      users          0 Nov 10 17:56 the_sky

Yes.

Okay, lets repeat the Solaris client test and get a snoop:

# touch /nfs4/mrx/loghyr/my_heart
# ls -la /nfs4/mrx/loghyr/my_heart
-rw-r--r--   1 nobody   nobody         0 Nov 10 21:26 /nfs4/mrx/loghyr/my_heart
# chown nfsv4:100 /nfs4/mrx/loghyr/my_heart
^C#

So the test case is reproducible.

[root@mrx ~]# ls -la /loghyr
total 12
drwxrwxrwx   2 root       root       4096 Nov 10 21:26 .
drwxr-xr-x  28 root       root       4096 Nov 10 17:11 ..
-rw-r--r--   1 tdh        users         0 Nov 10 17:47 it
-rw-r--r--   1 4294967294 4294967294    0 Nov 10 21:26 my_heart
-rw-r--r--   1 tdh        users         0 Nov 10 17:56 the_sky

The snoop was captured with the command:

will:> sudo snoop -x0,2000 -o /tmp/snoop will mrx

So in summary, there are two bugs, both in the Linux client:

  • Id mapping does not work in the loop back case.
  • Chown's are not working.

Trond and Bruce point out that I need no_root_squash set for the Id mapping to work. I tried that and:

[root@mrx ~]# vi /etc/exports
[root@mrx ~]# exportfs -r
[root@mrx ~]# umount /nfs4/mrx/loghyr
[root@mrx ~]# mount -t nfs4 mrx:/ /nfs4/mrx/loghyr
[root@mrx ~]# ls -al /nfs4/mrx/loghyr
total 8
drwxrwxrwx  2 nobody nobody 4096 Nov 10 22:00 .
drwxr-xr-x  3 root   root   4096 Nov 10 17:38 ..
-rw-r--r--  1 nobody nobody    0 Nov 10 17:47 it
-rw-r--r--  1 nobody nobody    0 Nov 10 21:26 my_heart
-rw-r--r--  1 nobody nobody    0 Nov 10 17:56 the_sky
-rw-r--r--  1 nobody nobody    0 Nov 10 22:00 trond
[root@mrx ~]# touch /nfs4/mrx/loghyr/bruce
[root@mrx ~]# ls -la  /nfs4/mrx/loghyr/bruce
-rw-r--r--  1 nobody nobody 0 Nov 10 22:12 /nfs4/mrx/loghyr/bruce
[root@mrx ~]# uname -a
Linux mrx.excfb.com 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 i686 i386 GNU/Linux
[root@mrx ~]# more /etc/exports
/loghyr *(rw,fsid=0,insecure,no_subtree_check,sync,no_root_squash)

It still fails. The good news is that it is better on the Solaris client:

# ls -la /nfs4/mrx/loghyr/
total 10
drwxrwxrwx   2 root     root        4096 Nov 10 22:12 .
drwxr-xr-x   3 root     root         512 Nov 10 17:20 ..
-rw-r--r--   1 root     root           0 Nov 10 22:12 bruce
-rw-r--r--   1 tdh      users          0 Nov 10 17:47 it
-rw-r--r--   1 nobody   nobody         0 Nov 10 21:26 my_heart
-rw-r--r--   1 tdh      users          0 Nov 10 17:56 the_sky
-rw-r--r--   1 nobody   nobody         0 Nov 10 22:00 trond

So, I do see a difference with the option across the wire in that bruce is not nobody like trond.

Trond has this to say about the Id mapping:

> Yeah... We currently have a problem: /etc/idmapd.conf specifies a
> default user/group that the NFSv4 "unknown user" maps to. At the same
> time, we have a different mapping of "anonuid/anongid" in /etc/exports
> that squashed users get mapped to.
> 
> Unless you explicitly keep the two in sync, then the above sort of
> weirdness appears to happen.
> 
> This clearly needs to be fixed, but we haven't yet gotten round to
> working out how we want to do this.
> 
> Cheers,
>   Trond

Data Ontap 7.0.2 Server - 11/10/2005

I'm now going to introduce a NetApp simulator to the mix. It is a stock 7.0.2 and has just been cleanly configured.

simcity> version
NetApp Release 7.0.2: Sat Oct  8 00:03:13 PDT 2005

Note that Data Ontap 7.0.1R1 is the first NetApp release for NFSv4 which is optimized for the release of Solaris 10.

I first create an aggregate to contain the volume and then the volume which I will export:

simcity> vol create loghyr -s none aggr1 1500k
simcity> vol create loghyr -s none aggr1 40m
simcity> df
Filesystem              kbytes       used      avail capacity  Mounted on
/vol/vol0/              120000      74992      45008      62%  /vol/vol0/
/vol/vol0/.snapshot      30000          0      30000       0%  /vol/vol0/.snapshot
/vol/loghyr/             32768       1408      31360       4%  /vol/loghyr/
/vol/loghyr/.snapshot       8192          0       8192       0%  /vol/loghyr/.snapshot
simcity> exportfs
/vol/vol0/home  -sec=sys,rw,nosuid
/vol/vol0       -sec=sys,rw,anon=0,nosuid
/vol/loghyr     -sec=sys,rw,nosuid

Note that the system automatically creates a default export for me. I can disable that functionality by changing the option nfs.export.auto-update off:

simcity> options nfs.export.auto-update off

Okay, what does the Solaris 10 client will think of the filer?

will:> sudo showmount -e simcity
export list for simcity:
/vol/vol0/home (everyone)
/vol/vol0      (everyone)
/vol/loghyr    (everyone)

Just for fun, what does the Solaris 10 client ultralord think of the filer?

ultralord:> sudo showmount -e simcity
^C

It doesn't think much of it? Why not? The simulator is running on ultralord and is camping on the same interface as the Sun box. You have to use another machine for testing.

Okay, so now we try and mount from the will:

will:> sudo mount simcity:/vol/loghyr /nfs/simcity/loghyr
will:> ls -la /nfs/simcity/loghyr
total 18
drwxr-xr-x   3 root     root        4096 Nov 11 00:10 .
drwxr-xr-x   4 root     root         512 Nov 11 00:18 ..
drwxrwxrwx   2 root     root        4096 Nov 11 00:10 .snapshot
will:> sudo touch /nfs/simcity/loghyr/it
touch: /nfs/simcity/loghyr/it cannot create

Okay, the default export for /vol/loghyr was not permissive enough. We can fix that easily:

simcity> exportfs -p rw,anon=0 /vol/loghyr
simcity> exportfs
/vol/vol0/home  -sec=sys,rw,nosuid
/vol/vol0       -sec=sys,rw,anon=0,nosuid
/vol/loghyr     -sec=sys,rw,anon=0

And we start back up on will:

will:> sudo touch /nfs/simcity/loghyr/it
will:> ls -al /nfs/simcity/loghyr/it
-rw-r--r--   1 root     root           0 Nov 11  2005 /nfs/simcity/loghyr/it
will:> sudo chown nfsv4:100 /nfs/simcity/loghyr/it
will:> ls -al /nfs/simcity/loghyr/it
-rw-r--r--   1 nfsv4    users          0 Nov 11  2005 /nfs/simcity/loghyr/it

Okay, everything looks fine. Time to get out the Linux client mrx:

[root@mrx ~]# showmount -e simcity
Export list for simcity:
/vol/vol0/home (everyone)
/vol/vol0      (everyone)
/vol/loghyr    (everyone)
[root@mrx ~]# mount -t nfs4 simcity:/vol/loghyr /nfs4/simcity/loghyr
mount to NFS server 'simcity' failed.

Okay, what is up here? Lets check our message logs. Nothing there. When in doubt, start with the root of the machine:

[root@mrx ~]# mount -t nfs4 simcity:/ /nfs4/simcity/loghyr
mount to NFS server 'simcity' failed.

What does snoop tell us?

will:> sudo snoop -x0,2000 -o /tmp/snoop mrx simcity
Using device /dev/iprb0 (promiscuous mode)
0 ^C

Nothing. We can either use pktt on the simulator or tcpdump from the Linux box. Okay, this is ugly, but I am learning new things:

[root@mrx ~]# tcpdump -s 9000 -w /tmp/dump.out port 2049
[root@mrx ~]# tethereal -V -r /tmp/dump.out > /tmp/xxx
[root@mrx ~]# vi /tmp/xxx
...
   Message Type: Reply (1)
    Program: NFS (100003)
    Program Version: 4
    Procedure: NULL (0)
    Reply State: accepted (0)
    This is a reply to a request in frame 4
    Time from request: 0.000825000 seconds
    Verifier
        Flavor: AUTH_NULL (0)
        Length: 0
    Accept State: remote can't support version # (2)
    Program Version (Minimum): 2
    Program Version (Maximum): 3
...

mrx sent a NFSv4 NULL RPC. simcity said no, I don't support version 4, but I can support versions from 2 to 3.

Why did the Solaris client work? Well, it autonegotiates. When the filer responds with "I don't do NFSv4.", the client looks at the reply and send off an equivalent NFSv3 request.

We can see that by asking the filer to only mount via NFSv4:

will:> sudo mount -o vers=4 simcity:/vol/loghyr /mnt
nfs mount: simcity NFS service not available RPC: Success
nfs mount: retrying: /mnt

But there is no way, that I know of, to see what version you have mounted after the mount succeeds:

will:> sudo mount -o vers=3 simcity:/vol/loghyr /mnt
will:> mount | grep simcity
/nfs/simcity/vol0 on simcity:/vol/vol0 remote/read/write/setuid/devices/xattr/dev=4700006 on Thu Nov 10 23:56:50 2005
/nfs/simcity/loghyr on simcity:/vol/loghyr remote/read/write/setuid/devices/xattr/dev=4700007 on Fri Nov 11 00:18:21 2005
/mnt on simcity:/vol/loghyr remote/read/write/setuid/devices/vers=3/xattr/dev=4700008 on Fri Nov 11 00:56:47 2005

Since we explicitly set vers=3 for /mnt, we can tell. It is late, or early, I thought Jeff Smith at Sun had told me how to determine this info. I'll bug him about it later.

Back to the filer - it ships by default with NFSv4 disabled. We have to enable it:

simcity> options nfs.v
nfs.v2.df_2gb_lim            off
nfs.v3.enable                on
nfs.v4.acl.enable            off
nfs.v4.enable                off
nfs.v4.id.domain             excfb.com
nfs.v4.read_delegation       off
nfs.v4.write_delegation      off
simcity> options nfs.v4.enable on

And by the way, the option nfs.v4.id.domain is how you configure the Id mapping for the filer.

Can the Linux client mount now?

[root@mrx ~]# mount -t nfs4 simcity:/vol/loghyr /nfs4/simcity/loghyr
[root@mrx ~]# ls -la /nfs4/simcity/loghyr
total 12
drwxr-xr-x  3 nobody nobody 4096 Nov 11 00:23 .
drwxr-xr-x  3 root   root   4096 Nov 11 00:22 ..
-rw-r--r--  1 nobody nobody    0 Nov 11 00:23 it
drwxrwxrwx  2 nobody nobody 4096 Nov 11 00:10 .snapshot

Time to remount the filer on will:

will:> sudo mount -o vers=4 simcity:/vol/loghyr /nfs/simcity/loghyr
will:> ls -la /nfs/simcity/loghyr
total 18
drwxr-xr-x   3 nobody   nobody      4096 Nov 11 00:23 .
drwxr-xr-x   4 root     root         512 Nov 11 00:18 ..
drwxrwxrwx   2 nobody   nobody      4096 Nov 11 00:10 .snapshot
-rw-r--r--   1 nobody   nobody         0 Nov 11 00:23 it

I already mentioned that nfs.v4.id.domain is the option to control the Id mapping. It matches what both the Sun and Linux boxes have, so why the failure?

We have a fresh install on the simulator and NIS is not configured. What values are there in the local name service files?

will:> more /nfs/simcity/vol0/etc/passwd
/nfs/simcity/vol0/etc/passwd: No such file or directory
will:> more /nfs/simcity/vol0/etc/group
/nfs/simcity/vol0/etc/group: No such file or directory
will:> sudo cp /etc/passwd /nfs/simcity/vol0/etc/passwd
will:> sudo cp /etc/group /nfs/simcity/vol0/etc/group

How about it now?

will:> ls -la /nfs/simcity/loghyr
total 18
drwxr-xr-x   3 root     root        4096 Nov 11 00:23 .
drwxr-xr-x   4 root     root         512 Nov 11 00:18 ..
drwxrwxrwx   2 root     root        4096 Nov 11 00:10 .snapshot
-rw-r--r--   1 nfsv4    users          0 Nov 11 00:23 it

and

[root@mrx ~]# ls -la /nfs4/simcity/loghyr
total 12
drwxr-xr-x  3 root  root  4096 Nov 11 00:23 .
drwxr-xr-x  3 root  root  4096 Nov 11 00:22 ..
-rw-r--r--  1 nfsv4 users    0 Nov 11 00:23 it
drwxrwxrwx  2 root  root  4096 Nov 11 00:10 .snapshot

Playing with automounters - 11/11/2005

So Solaris comes with its automounter on by default. Lets see what happens!

From a Solaris client:

ultralord:> cd /net/will/loghyr
ultralord:> ls -la
total 149
drwxr-xr-x  10 tdh      root         512 Nov 11 12:59 .
dr-xr-xr-x   2 root     root           2 Nov 11 13:23 ..
drwxr-xr-x   3 root     root         512 Nov 10 10:58 Solaris10
drwxr-xr-x   3 root     root         512 Nov  9 16:37 fedora
drwxr-xr-x  11 root     root         512 Apr  4  2005 home
drwxr-xr-x   3 root     root         512 Mar 12  2005 html
-rw-r--r--   1 root     root       34654 Nov  9 17:10 httpd.conf
drwxr-xr-x   2 tdh      staff        512 Nov 10 00:58 isos
lrwxrwxrwx   1 tdh      staff         14 Nov 11 12:59 lilly -> will2mrx.snoop
drwx------   2 root     root        8192 Nov  7 21:27 lost+found
-rw-r--r--   1 nfsv4    users          0 Nov 10 21:32 my_heart
drwxr-x---   4 root     smmsp        512 Dec  2  2004 named
-rw-r--r--   1 root     nobody         0 Nov 10 15:13 norman
-rw-r--r--   1 root     root        8793 Nov  9 02:24 readme.txt
-r--r--r--   1 root     root        1609 Nov 11 12:47 sendmail.mc
drwxr-xr-x   2 root     root         512 Nov 10 00:47 spool
-rw-r--r--   1 root     root       11544 Nov 10 21:31 will2mrx.snoop

Note that we can tell that this is NFSv4 from the gid for norman:

ultralord:> sudo mount -o vers=3 will:/loghyr /mnt
ultralord:> ls -la /mnt/norman
-rw-r--r--   1 root     battle         0 Nov 10 15:13 /mnt/norman
ultralord:> grep battle /etc/group
battle::1066:

Now, what do we see from the Linux server?

ultralord:> cd /net/mrx
ultralord:> ls -la
total 4
dr-xr-xr-x   3 root     root           3 Nov 11 13:27 .
dr-xr-xr-x   3 root     root           3 Nov  8 23:05 ..
dr-xr-xr-x   1 root     root           1 Nov 11 13:27 export
dr-xr-xr-x   1 root     root           1 Nov 11 13:27 loghyr
ultralord:> cd export
export: Permission denied.

Where does that directory export come from? My guess is that it is an automatic entry for the pseudofs.

Bzzt! Further analysis has shown it to be pilot error induced by trying to follow different how-to guides.

[root@mrx ~]# exportfs
/loghyr         
[root@mrx ~]# showmount -e
Export list for mrx.excfb.com:
/export gss/krb5p,gss/krb5i,gss/krb5,*
/loghyr *
[root@mrx ~]# exportfs -au
[root@mrx ~]# exportfs
[root@mrx ~]# exportfs -r
[root@mrx ~]# exportfs
/loghyr         
[root@mrx ~]# showmount -e
Export list for mrx.excfb.com:
/loghyr *

Okay, clearly pilot error for "/export".

ultralord:> showmount -e mrx
export list for mrx:
/loghyr *

Back to the action!

[root@mrx ~]# exportfs
/loghyr         

What about the one we should be able to get into?

ultralord:> cd loghyr
loghyr: Permission denied.

Snoop alert!

ultralord:> sudo snoop -x0,2000 -o /tmp/snoop mrx ultralord
Using device /dev/hme (promiscuous mode)
82 ^C

Hmm, here is the snoop file, I think the Linux box is confused about what the root is for the pseudofs. Can we mount it manually?

ultralord:> sudo mkdir -p /nfs/mrx/loghyr
ultralord:> sudo mount mrx:/ /nfs/mrx/loghyr
ultralord:> cd /nfs/mrx/loghyr
ultralord:> ls -la
total 10
drwxrwxrwx   2 root     root        4096 Nov 10 22:41 .
drwxr-xr-x   3 root     root         512 Nov 11 13:40 ..
-rw-r--r--   1 root     root           0 Nov 10 22:12 bruce
-rw-r--r--   1 tdh      users          0 Nov 10 17:47 it
-rw-r--r--   1 nobody   nobody         0 Nov 10 21:26 my_heart
-rw-r--r--   1 root     root           0 Nov 10 22:41 spencer
-rw-r--r--   1 tdh      users          0 Nov 10 17:56 the_sky
-rw-r--r--   1 nobody   nobody         0 Nov 10 22:00 trond

Yes.

What about the filer? Can we automount it? Since the simulator is running on ultralord, lets move to sandman:

sandman.excfb.com:> uname -a
SunOS sandman.excfb.com 5.10 Generic sun4u sparc SUNW,Ultra-5_10
sandman.excfb.com:> cd /net
sandman.excfb.com:> cd simcity
sandman.excfb.com:> ls -la
total 3
dr-xr-xr-x   2 root     root           2 Nov 11 13:51 .
dr-xr-xr-x   2 root     root           2 Nov 11 13:33 ..
dr-xr-xr-x   1 root     root           1 Nov 11 13:51 vol
sandman.excfb.com:> cd vol
sandman.excfb.com:> ls -la
total 4
dr-xr-xr-x   3 root     root           3 Nov 11 13:51 .
dr-xr-xr-x   2 root     root           2 Nov 11 13:51 ..
dr-xr-xr-x   1 root     root           1 Nov 11 13:51 loghyr
dr-xr-xr-x   1 root     root           1 Nov 11 13:51 vol0
sandman.excfb.com:> cd loghyr
sandman.excfb.com:> ls -la
total 17
drwxr-xr-x   3 root     root        4096 Nov 11 00:23 .
dr-xr-xr-x   3 root     root           3 Nov 11 13:51 ..
drwxrwxrwx   5 root     root        4096 Nov 11 10:00 .snapshot
-rw-r--r--   1 nfsv4    users          0 Nov 11 00:23 it

Okay, it works.


Followup on Linux Server - 11/12/2005

I upgraded mrx and wanted to see if it still had Id map issues:

[root@mrx log]# uname -a
Linux mrx.excfb.com 2.6.13-1.1532_FC4 #1 Thu Oct 20 01:30:08 EDT 2005 i686 i686 i386 GNU/Linux
[root@mrx ~]# mkdir -p /nfs4/mrx/loghyr
[root@mrx ~]# mount -t nfs4 mrx:/ /nfs4/mrx/loghyr
[root@mrx ~]# ls -al /nfs4/mrx/loghyr
total 8
drwxrwxrwx  2 nobody nobody 4096 Nov 10 22:41 .
drwxr-xr-x  3 root   root   4096 Nov 10 17:38 ..
-rw-r--r--  1 nobody nobody    0 Nov 10 22:12 bruce
-rw-r--r--  1 nobody nobody    0 Nov 10 17:47 it
-rw-r--r--  1 nobody nobody    0 Nov 10 21:26 my_heart
-rw-r--r--  1 nobody nobody    0 Nov 10 22:41 spencer
-rw-r--r--  1 nobody nobody    0 Nov 10 17:56 the_sky
-rw-r--r--  1 nobody nobody    0 Nov 10 22:00 trond
[root@mrx ~]# ls -la /loghyr
total 12
drwxrwxrwx   2 root       root       4096 Nov 10 22:41 .
drwxr-xr-x  28 root       root       4096 Nov 11 17:30 ..
-rw-r--r--   1 root       root          0 Nov 10 22:12 bruce
-rw-r--r--   1 tdh        users         0 Nov 10 17:47 it
-rw-r--r--   1 4294967294 4294967294    0 Nov 10 21:26 my_heart
-rw-r--r--   1 root       root          0 Nov 10 22:41 spencer
-rw-r--r--   1 tdh        users         0 Nov 10 17:56 the_sky
-rw-r--r--   1 4294967294 4294967294    0 Nov 10 22:00 trond

Okay, do the logs say anything?

[root@mrx log]# tail /var/log/messages
Nov 12 19:55:24 mrx kernel: nfsd: nfsv4 idmapping failing: has idmapd not been started?

Okay, is it running?

[root@mrx log]# ps aux | grep idmap
root      1837  0.0  0.1   4376  1336 ?        Ss   Nov11   0:00 rpc.idmapd
root      1555  0.0  0.0   3760   664 pts/2    R+   20:00   0:00 grep idmap

Time to dig up another Linux client, this time adept:

[tdh@adept ~]$ uname -a
Linux adept.excfb.com 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 i686 i386 GNU/Linux
[tdh@adept ~]$ ps aux | grep idmap
root      1680  0.0  0.1   4380  1064 ?        Ss   09:36   0:00 rpc.idmapd
tdh       9389  0.0  0.0   3764   672 pts/9    R+   20:07   0:00 grep idmap
[tdh@adept ~]$ sudo mkdir -p /nfs4/mrx/loghyr
[tdh@adept ~]$ sudo mount -t nfs4 mrx:/ /nfs4/mrx/loghyr
[tdh@adept ~]$ ls -la /nfs4/mrx/loghyr
total 8
drwxrwxrwx  2 nobody nobody 4096 Nov 10 22:41 .
drwxr-xr-x  3 root   root   4096 Nov 12 20:07 ..
-rw-r--r--  1 nobody nobody    0 Nov 10 22:12 bruce
-rw-r--r--  1 nobody nobody    0 Nov 10 17:47 it
-rw-r--r--  1 nobody nobody    0 Nov 10 21:26 my_heart
-rw-r--r--  1 nobody nobody    0 Nov 10 22:41 spencer
-rw-r--r--  1 nobody nobody    0 Nov 10 17:56 the_sky
-rw-r--r--  1 nobody nobody    0 Nov 10 22:00 trond

And nothing in the message files.

Hmm, it doesn't think that the idmapper is running?

[tdh@adept ~]$  more /proc/net/rpc/nfs4.idtoname/content
/proc/net/rpc/nfs4.idtoname/content: No such file or directory
[tdh@adept ~]$ cd /proc/net/rpc
[tdh@adept rpc]$ ls -la
total 0
dr-xr-xr-x  4 root root 0 Nov 12 20:09 .
dr-xr-xr-x  5 root root 0 Nov 12 20:09 ..
dr-xr-xr-x  2 root root 0 Nov 12 20:09 auth.domain
dr-xr-xr-x  2 root root 0 Nov 12 20:09 auth.unix.ip
-r--r--r--  1 root root 0 Nov 12 20:09 nfs
[root@adept ~]# chkconfig --list | grep nfs
nfs             0:off   1:off   2:off   3:off   4:off   5:off   6:off
nfslock         0:off   1:off   2:off   3:on    4:on    5:on    6:off
[root@adept ~]# service nfs restart
Shutting down NFS mountd:                                  [FAILED]
Shutting down NFS daemon:                                  [FAILED]
Shutting down NFS quotas:                                  [FAILED]
Shutting down NFS services:                                [  OK  ]
Starting NFS services:                                     [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]
[root@adept ~]#                                            [  OK  ]
[root@adept ~]# more /proc/net/rpc/nfs4.idtoname/content
#domain type id [name]

Is the Id mapping just done for servers on Linux?

And it looks like we can hang a Linux client when we do a chown:

[root@adept ~]# touch /nfs4/mrx/loghyr/sky
[root@adept ~]# chown nfsv4:100 /nfs4/mrx/loghyr/sky

I wonder if I see problems because I'm not using Kerberos yet?

I found this link on NFS troubleshooting. I'm not enabling debugging in the RPC layer:

[root@mrx ~]# echo 2048 > /proc/sys/sunrpc/rpc_debug
/proc/net/rpc/nfs4.idtoname/content:#domain type id [name]
/proc/net/rpc/nfs4.idtoname/content:# expiry=1131856223 refcnt=1
/proc/net/rpc/nfs4.idtoname/content:# * group 0
/proc/net/rpc/nfs4.idtoname/content:# expiry=1131856223 refcnt=1
/proc/net/rpc/nfs4.idtoname/content:# * user 0
/proc/net/rpc/nfs4.idtoname/content:# expiry=1131856223 refcnt=1
/proc/net/rpc/nfs4.idtoname/content:# * user -2
/proc/net/rpc/nfs4.idtoname/content:# expiry=1131856223 refcnt=1
/proc/net/rpc/nfs4.idtoname/content:# * group -2
/proc/net/rpc/nfs4.idtoname/content:# expiry=1131856223 refcnt=1
/proc/net/rpc/nfs4.idtoname/content:# * user 1066
/proc/net/rpc/nfs4.idtoname/content:# expiry=1131856223 refcnt=1
/proc/net/rpc/nfs4.idtoname/content:# * group 100
/proc/net/rpc/nfs4.idtoname/content:# expiry=1131856223 refcnt=1
/proc/net/rpc/nfs4.idtoname/content:# * user 3530
[root@mrx ~]# grep idmapd /var/log/messages
Nov 10 17:18:54 mrx rpc.idmapd: nfsdreopen: Opening '' failed: errno 2 (No such file or directory)
Nov 10 17:39:01 mrx kernel: nfsd: nfsv4 idmapping failing: has idmapd not been started?
Nov 10 17:57:50 mrx rpc.idmapd: nfsdcb: write(/proc/net/rpc/nfs4.nametoid/channel) failed: errno 22 (Invalid argument)
Nov 10 21:27:22 mrx rpc.idmapd: nfsdcb: write(/proc/net/rpc/nfs4.nametoid/channel) failed: errno 22 (Invalid argument)
Nov 10 22:42:11 mrx rpc.idmapd: nfsdcb: write(/proc/net/rpc/nfs4.nametoid/channel) failed: errno 22 (Invalid argument)
Nov 11 17:30:37 mrx rpc.idmapd: nfsdreopen: Opening '' failed: errno 2 (No such file or directory)
Nov 12 19:55:24 mrx kernel: nfsd: nfsv4 idmapping failing: has idmapd not been started?
Nov 12 20:13:22 mrx rpc.idmapd: nfsdcb: write(/proc/net/rpc/nfs4.nametoid/channel) failed: errno 22 (Invalid argument)

So it looks like the correct uid and gid values are there.

This is sweet, I added -v to the options in /etc/init.d/rpcidmap and restarted it:

[root@mrx init.d]# service rpcidmapd restart
Shutting down RPC idmapd:                                  [  OK  ]
Starting RPC idmapd:                                       [  OK  ]
[root@mrx init.d]# ls -al /nfs4/mrx//loghyr
total 8
drwxrwxrwx  2 root   root   4096 Nov 12 20:13 .
drwxr-xr-x  3 root   root   4096 Nov 10 17:38 ..
-rw-r--r--  1 root   root      0 Nov 10 22:12 bruce
-rw-r--r--  1 tdh    users     0 Nov 10 17:47 it
-rw-r--r--  1 nobody nobody    0 Nov 10 21:26 my_heart
-rw-r--r--  1 nfsv4  users     0 Nov 12 20:13 sky
-rw-r--r--  1 root   root      0 Nov 10 22:41 spencer
-rw-r--r--  1 tdh    users     0 Nov 10 17:56 the_sky
-rw-r--r--  1 nobody nobody    0 Nov 10 22:00 trond

And now it works. Futhermore, the chown test works:

[root@mrx init.d]# touch /nfs4/mrx/fooper
[root@mrx init.d]# ls -la /nfs4/mrx/fooper
-rw-r--r--  1 root root 0 Nov 12 22:46 /nfs4/mrx/fooper
[root@mrx init.d]# chown nfsv4:100 /nfs4/mrx/fooper

Here is what was in the log file:

Nov 12 22:43:59 mrx rpc.idmapd: Opened /proc/net/rpc/nfs4.nametoid/channel
Nov 12 22:43:59 mrx rpc.idmapd: Opened /proc/net/rpc/nfs4.idtoname/channel
Nov 12 22:43:59 mrx rpc.idmapd: New client: 0
Nov 12 22:43:59 mrx rpc.idmapd: Opened /var/lib/nfs/rpc_pipefs/nfs/clnt0/idmap
Nov 12 22:44:13 mrx rpc.idmapd: nfsdcb: authbuf=* authtype=user
Nov 12 22:44:13 mrx rpc.idmapd: nfsdcb: authbuf=* authtype=group
Nov 12 22:44:13 mrx rpc.idmapd: nfsdcb: authbuf=* authtype=user
Nov 12 22:44:13 mrx rpc.idmapd: nfsdcb: authbuf=* authtype=group
Nov 12 22:44:13 mrx rpc.idmapd: nfsdcb: authbuf=* authtype=user
Nov 12 22:44:13 mrx rpc.idmapd: nfsdcb: authbuf=* authtype=group
Nov 12 22:44:13 mrx rpc.idmapd: nfsdcb: authbuf=* authtype=user

I think just restarting the idmapper works. Hmm, I wonder if it is the local which is the issue? I.e., I jumped back on adept and tried this without changing the flags:

[root@adept ~]# service rpcidmapd restart
Shutting down RPC idmapd:                                  [  OK  ]
Starting RPC idmapd:                                       [  OK  ]
[root@adept ~]# ls -la /nfs4/mrx/loghyr
total 8
drwxrwxrwx  2 root   root   4096 Nov 12 20:13 .
drwxr-xr-x  3 root   root   4096 Nov 12 20:07 ..
-rw-r--r--  1 root   root      0 Nov 10 22:12 bruce
-rw-r--r--  1 tdh    users     0 Nov 10 17:47 it
-rw-r--r--  1 nobody nobody    0 Nov 10 21:26 my_heart
-rw-r--r--  1 nfsv4  users     0 Nov 12 20:13 sky
-rw-r--r--  1 root   root      0 Nov 10 22:41 spencer
-rw-r--r--  1 tdh    users     0 Nov 10 17:56 the_sky
-rw-r--r--  1 nobody nobody    0 Nov 10 22:00 trond

But I should try test this before I reloaded.

The Solaris clients started succeeding in their chown:

# uname -a
SunOS sandman.excfb.com 5.10 Generic sun4u sparc SUNW,Ultra-5_10
# mkdir -p /nfs/mrx/loghyr
# mount -o intr mrx:/ /nfs/mrx/loghyr
# ls -la /nfs/mrx/loghyr
total 10
drwxrwxrwx   2 root     root        4096 Nov 12 20:13 .
drwxr-xr-x   3 root     root         512 Nov 12 22:55 ..
-rw-r--r--   1 root     root           0 Nov 10 22:12 bruce
-rw-r--r--   1 tdh      users          0 Nov 10 17:47 it
-rw-r--r--   1 nobody   nobody         0 Nov 10 21:26 my_heart
-rw-r--r--   1 nfsv4    users          0 Nov 12 20:13 sky
-rw-r--r--   1 root     root           0 Nov 10 22:41 spencer
-rw-r--r--   1 tdh      users          0 Nov 10 17:56 the_sky
-rw-r--r--   1 nobody   nobody         0 Nov 10 22:00 trond
# touch /nfs/mrx/loghyr/sandman
# chown nfsv4:100 /nfs/mrx/loghyr/sandman

Time to reboot mrx and see what happens.

Okay, rpcidmapd is not loading upon reboot.

[root@mrx ~]# mount -t nfs4 mrx:/ /nfs4/mrx/loghyr
[root@mrx ~]# ls -la  /nfs4/mrx/loghyr
total 8
drwxrwxrwx  2 nobody nobody 4096 Nov 12 22:54 .
drwxr-xr-x  3 root   root   4096 Nov 12 22:46 ..
-rw-r--r--  1 nobody nobody    0 Nov 10 22:12 bruce
-rw-r--r--  1 nobody nobody    0 Nov 10 17:47 it
-rw-r--r--  1 nobody nobody    0 Nov 10 21:26 my_heart
-rw-r--r--  1 nobody nobody    0 Nov 12 22:54 sandman
-rw-r--r--  1 nobody nobody    0 Nov 12 20:13 sky
-rw-r--r--  1 nobody nobody    0 Nov 10 22:41 spencer
-rw-r--r--  1 nobody nobody    0 Nov 10 17:56 the_sky
-rw-r--r--  1 nobody nobody    0 Nov 10 22:00 trond
[root@mrx ~]# grep idmap /var/log/messages
Nov 12 23:03:30 mrx rpc.idmapd: nfsdreopen: Opening '' failed: errno 2 (No such file or directory)
Nov 12 23:05:45 mrx kernel: nfsd: nfsv4 idmapping failing: has idmapd not been started?
[root@mrx ~]# date
Sat Nov 12 23:07:15 CST 2005

And furthermore, it hoses up the clients:

[root@adept ~]# ls -la /nfs4/mrx/loghyr
total 8
drwxrwxrwx  2 nobody nobody 4096 Nov 12 22:54 .
drwxr-xr-x  3 root   root   4096 Nov 12 20:07 ..
-rw-r--r--  1 nobody nobody    0 Nov 10 22:12 bruce
-rw-r--r--  1 nobody nobody    0 Nov 10 17:47 it
-rw-r--r--  1 nobody nobody    0 Nov 10 21:26 my_heart
-rw-r--r--  1 nobody nobody    0 Nov 12 22:54 sandman
-rw-r--r--  1 nobody nobody    0 Nov 12 20:13 sky
-rw-r--r--  1 nobody nobody    0 Nov 10 22:41 spencer
-rw-r--r--  1 nobody nobody    0 Nov 10 17:56 the_sky
-rw-r--r--  1 nobody nobody    0 Nov 10 22:00 trond

Note that even though I didn't reboot adept, the idmapping is hosed.

[root@mrx ~]# chkconfig --list | grep idmap
rpcidmapd       0:off   1:off   2:off   3:on    4:on    5:on    6:off
[root@mrx ~]# ps -ef | grep idmap
root      1760     1  0 23:03 ?        00:00:00 rpc.idmapd
root      2864  2775  0 23:10 pts/2    00:00:00 grep idmap
[root@mrx ~]# service rpcidmapd restart
Shutting down RPC idmapd:                                  [  OK  ]
Starting RPC idmapd:                                       [  OK  ]
[root@mrx ~]# ls -la  /nfs4/mrx/loghyr
total 8
drwxrwxrwx  2 root   root   4096 Nov 12 22:54 .
drwxr-xr-x  3 root   root   4096 Nov 12 22:46 ..
-rw-r--r--  1 root   root      0 Nov 10 22:12 bruce
-rw-r--r--  1 tdh    users     0 Nov 10 17:47 it
-rw-r--r--  1 nobody nobody    0 Nov 10 21:26 my_heart
-rw-r--r--  1 nfsv4  users     0 Nov 12 22:54 sandman
-rw-r--r--  1 nfsv4  users     0 Nov 12 20:13 sky
-rw-r--r--  1 root   root      0 Nov 10 22:41 spencer
-rw-r--r--  1 tdh    users     0 Nov 10 17:56 the_sky
-rw-r--r--  1 nobody nobody    0 Nov 10 22:00 trond

So it is configured, it is running, but it isn't working.


Followup on Linux Server - 11/20/2005

Kevin Coffman added a new patch for nfs utils.

This fixed the above problems with the Linux Id mapping.


This Site Is Best Viewed with a mug of hot tea and ANY Web Browser. Created with vi(m)


Tom Haynes ( nfsv4 (at) excfb (dot) com) 10 November 2005