What if a process on an NFS client machine had /home/stern/summary open when it was deleted on the server or by some other client? NFS has no record of
<font face="Courier">open()</font>activity, so it cannot notify the client that one of its open files has been removed. The next time the client sends a request with the file handle for the "summary" file, the NFS server recognizes that the handle contains an inode generation number that no longer matches the current generation, and it returns a stale file handle error. You'll also end up with stale file handles when the inode is no longer valid, for example, if a file is removed but the newly freed inode has not been re-used.
If you want to watch a network crumble, try restoring an NFS-exported filesystem onto a pristine filesystem without rebooting NFS clients using it. When the new filesystem is created,
<font face="Courier">newfs</font>runs a utility called
<font face="Courier">fsirand</font>to randomly seed the inode generation numbers. During the restore process, files are attached to the first available inode, not necessarily the same inode number they had in the old filesystem. Every client that has an open file handle on the restored filesystem will see stale file handles, since either the inode number or generation number will be mismatched. Clients will hammer away at the network, retrying NFS requests that fail, unable to determine how to fix the stale file handle problems. Your only recourse is to reboot the net-world and let the clients acquire new handles.
How do you associate an NFS error with a client process? First, identify the file in question on the server. In SunOS 4.1.x, the
<font face="Courier">showfh</font>utility takes a file handle and resolves it to a file on the NFS server. However, the RPC daemon used by
<font face="Courier">rpc.showfhd</font>) isn't started by default, and it frequently times out due to the long search time required to find the inode in question. An easier approach is to use a server-side script called
<font face="Courier">fhfind</font>, written by Sun's Brent Callaghan (creator of the automounter), that takes a file handle and locates the file associated with it. For example, let's say that you're seeing:
<font face="Courier">NFS write error 28 on server bigboy 1540002 2 a0000 4f77 48df4455 a0000 2 25d1121d </font>
Error 28 is ENOSPC, so you're out of disk space. Running
<font face="Courier">df</font>on the server verifies that problem. Your job: Get the writing client to ease up so you can clean up. On server bigboy, run
<font face="Courier">fhfind</font>to identify the file represented by the file handle:
<font face="Courier">bigboy# fhfind 1540002 2 a0000 4f77 48df4455 a0000 2 25d1121d /export/home/stern/summary </font>
<font face="Courier">fhfind</font>can take quite a while, particularly for large filesystems, because it does a
<font face="Courier">find</font>on every file to locate the inode number. On the client reporting the error, use the
<font face="Courier">fuser</font>utility to find the process holding this file open:
<font face="Courier">huey# fuser /home/stern/summary /home/stern/summary: 10543o </font>
We can get more detail via the
<font face="Courier">huey# lsof /home/stern/summary COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE/NAME reptool 12582 stern 3r VREG 0x022000a9 158 68376 /home/stern (sugar:/export/home/stern) </font>
<font face="Courier">lsof</font>shows us the file descriptor number used to hold the file open, as well as some information normally included with
<font face="Courier">ps</font>. Look for open files of type VREG in
<font face="Courier">lsof</font>'s output, noting that these are regular open files. Entries marked with a type of VDIR are current directories, and are probably not the source of your problem.
There is a drawback to this approach: stale file handles can't be found using the
<font face="Courier">fhfind</font>script. Inodes associated with stale file handles either aren't valid, and therefore can't be found by searching the filesystem, or have been re-used by a new file, possibly with a different name. In this case the best tactic is to narrow down the process candidates using
<font face="Courier">lsof</font>to find those with NFS files open:
<font face="Courier">huey# lsof -N | fgrep VREG </font>
Look for file descriptors (in the FD column) with a
<font face="Courier">w</font>in them, indicating the file has been opened for writing. You don't really need the filename for the stale file handle; it may not even exist at this point. Just take the inode numbers reported by
<font face="Courier">lsof</font>and match them against the inode numbers pulled from the stale file handle error messages on the console. Use this script to convert a file handle into a server inode number:
<font face="Courier">#! /bin/sh # fh2inode - convert NFS file handle to inode fh=`echo $4 | tr [a-z] [A-Z]` echo "ibase=16;$fh" | bc </font>
If the server exports more than one filesystem, you'll need to find the volume associated with the stale handle. The first value in the file handle is a filesystem ID; match it to the mounted filesystem ID values in /etc/mnttab to locate the volume on which you're experiencing an error.
As soon as you've found the process writing to a stale file handle, clean up gently by polling the user, then killing or restarting the process.
close() to the edge
Detecting errors while writing to a file is complex for both NFS and local filesystems. Unix does asynchronous writes, that is, the writes are stacked up by the operating system and flushed out periodically. On local disks, the
<font face="Courier">update</font>daemon runs every 30 seconds to force pending writes to disk. With NFS, the kernel threads (Solaris 2.x) or biod processes (SunOS 4.1.x) queue writes locally. What happens if an error occurs during the completion of one of the
<font face="Courier">write()</font>system calls? In short, the error is reported back on the next
<font face="Courier">write()</font>system call or on the call to
<font face="Courier">close()</font>. You're guaranteed to see any errors by the time
<font face="Courier">close()</font>returns, because all pending writes are flushed (converted to synchronous writes) when the file is closed.
How often does your code check the return value from
<font face="Courier">close()</font>? Again, this is an issue for local disks and NFS filesystems, although you're more likely to see problems with NFS since most error checking is done by the server, after the request has been buffered and subsequently flushed by the client. If you fill a disk or exceed a quota, you run the risk of having an NFS write fail undetected unless you check return and errno values from
The moral of the story is religious enforcement of standards for system programming, paying particular attention to error checking. If you want to become the patron saint of errno, here are some guiding principles:
Explicitly clear errno before a system call for which you'll check errno. The value of errno is not set to zero after a successful call, so you may end up testing a value set several system calls earlier. Limit the window to which you're exposed to side effects by keeping the system call and errno reset as close as possible. If you intersperse library calls, you may be inadvertently setting errno via a system call made from the library routine. When errno tests produce unexpected results, use
<font face="Courier">truss</font>to make sure that the value you're testing was produced by the system call immediately preceding the test.
Beware of interactions between system tuning efforts and errors. If you change the TCP keepalive timer interval in an effort to get socket addresses re-used more quickly, you'll eliminate EADDRINUSE errors but will generate more network traffic for the keepalive probes.
- Remember that the whole world doesn't speak English, and link
<font face="Courier">-lintl</font>. The
<font face="Courier">perror()</font>library routine looks up error messages in an internationalized library, accessed if libintl.so is linked in. If your system or application sets its locale, you can see Unix error messages in French, German, Italian, or other languages. (And you thought the opera reference was a non sequitur.)
Rigorous adherence to good system programming practices prevents odd failures due to unexpected input or output conditions. Nobody plans for their code to handle disk overflows, but these deficiencies become clear at the worst possible moment, when the system -- and you -- are under maximum stress. Unresolved error conditions are the ones that cause loss of data, jobs, or your sanity. Keep your users abreast of your system style guide, and you might just have time to appreciate those great operatic moments.