Errno Libretto

By Hal Stern, Unix Insider |  Operating Systems

What if a process on an NFS client machine had
/home/stern/summary open when it was deleted on the server or
by some other client? NFS has no record of open()
activity, so it cannot notify the client that one of its open files has
been removed. The next time the client sends a request with the file
handle for the "summary" file, the NFS server recognizes that the
handle contains an inode generation number that no longer matches the
current generation, and it returns a stale file handle error. You'll
also end up with stale file handles when the inode is no longer valid,
for example, if a file is removed but the newly freed inode has not
been re-used.

If you want to watch a network crumble, try restoring an NFS-exported
filesystem onto a pristine filesystem without rebooting NFS clients
using it. When the new filesystem is created, newfs runs
a utility called fsirand to randomly seed the inode
generation numbers. During the restore process, files are attached to
the first available inode, not necessarily the same inode number they
had in the old filesystem. Every client that has an open file handle
on the restored filesystem will see stale file handles, since either
the inode number or generation number will be mismatched. Clients will
hammer away at the network, retrying NFS requests that fail, unable to
determine how to fix the stale file handle problems. Your only
recourse is to reboot the net-world and let the clients acquire new
handles.

How do you associate an NFS error with a client process? First,
identify the file in question on the server. In SunOS 4.1.x, the
showfh utility takes a file handle and resolves it to a
file on the NFS server. However, the RPC daemon used by
showfh (rpc.showfhd) isn't started by
default, and it frequently times out due to the long search time
required to find the inode in question. An easier approach is to use a
server-side script called
fhfind,
written by Sun's Brent Callaghan (creator of the automounter), that
takes a file handle and locates the file associated with it. For
example, let's say that you're seeing:

NFS write error 28 on server bigboy 1540002 2 a0000 4f77 48df4455 a0000 2 25d1121d

Error 28 is ENOSPC, so you're out of disk space. Running
df on the server verifies that problem. Your job: Get the
writing client to ease up so you can clean up. On server
bigboy, run fhfind to identify the file
represented by the file handle:

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Operating SystemsWhite Papers & Webcasts

See more White Papers | Webcasts

Answers - Powered by ITworld

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question
randomness