Unix Insider –
Opera is something I do not appreciate fully. The costumes are exquisite, the music is emotional, but without understanding Italian the plots are hard to follow. Pavarotti could be performing a free-form exploration of the UUCP source code and I would have trouble distinguishing it from Madama Butterfly. Bugs Bunny making a fruit salad on Elmer Fudd's head is the most comprehensible opera I've witnessed. "Wait," my cultured friends tell me, "use the libretto to grasp the story." While it's not quite a set of Cliff Notes, the libretto (text of the opera) helps you build a framework for understanding the action on stage.
What does this have to do with the world of system administration? If the error messages, user questions, system-call errors, and other cryptic failures you encounter sometimes make as much sense La Traviata, then you need a libretto -- a framework for understanding what the system is trying to tell you. We'll look at the various ways in which system calls fail, and the symptoms by which those failures manifest themselves. Starting with general file permission issues, we'll then dive down into NFS failures, and close with some comments on the importance of vigilance in enforcing system programming guidelines. You may not understand Puccini any better than before, but such help is easier to find.
System calls represent the boundary between user processes and operating system (kernel) services. When a process executes a system call, the associated wrapper in the libc.so library is called to perform some basic argument checking. If the call is syntactically acceptable, the wrapper executes a privileged instruction to force a trap into the kernel. From there, the operating system takes over by copying arguments, performing extensive checking, and completing the service request. If you dump out the code for a system call in libc.so, you'll see a "ta 8" instruction to issue trap 0x08, which is a system call (see /usr/include/sys/trap.h for trap types):
<font face="Courier">huey% adb /usr/lib/libc.so.* _read,4?ia _read: st %o0, [%sp + 0x44] read+4: mov 0x3, %g1 read+8: ta 0x8 read+0xc: bgeu read + 0x40 </font>
Nearly every system call returns a single value, ranging from a pointer or an address, such as from
<font face="Courier">brk()</font>, to the size of a data transfer from
<font face="Courier">write()</font>, to a standard system type like a UID returned by
<font face="Courier">getuid()</font>. System calls that return integers often use negative return values to flag a failure, but this rule doesn't apply to calls that return addresses, which are usually set to NULL if the call fails. Simple, inconsistent indicators of success or failure don't give you (and your process) enough information to determine what went wrong and how to repair the situation, so the system call return value is supplemented by the error number, or errno value.
If an exception is encountered while processing the system call, errno is set to one of the values in /usr/include/sys/errno.h. A successful call sets errno to zero. Most applications include the errno.h header file, containing the possible values of errno. Insert a
<font face="Courier">extern int errno;</font>in your code, and it is accessible as an integer variable.
In theory, your code should check the value of errno after each system call, including those that should "never" fail like
<font face="Courier">close()</font>, because these system calls can report failures deferred from other requests -- a topic we'll visit later. Of course, not all code does such paranoid checking, and you can't modify commercial applications to make them fit your quality standards ex post facto. So, how do you start tracking down a user issue when all you have is an error message?
The system call return value
is supplemented by the error number,or errno value.
The first thing to do is to become familiar with the various kinds of errors reported back through the errno mechanism. Your best source of information is the introduction to section 2 of the manual pages:
<font face="Courier">huey% man -s 2 intro </font>
It explains the possible error values and associates them with the cryptic error messages like "address already in use" printed by the
<font face="Courier">perror()</font>library routine. The descriptions aren't exhaustive and some of the errors are entirely non-obvious. Once you have a feel for the target, examine the routine in question (in this case, probproc) using
<font face="Courier">huey% truss -o /tmp/tr.out probproc -a -X -i90 </font>
<font face="Courier">truss</font>dumps its output into the file named by the
<font face="Courier">trace</font>, the SunOS 4.1.x equivalent, doesn't follow forks or trace child processes, but
<font face="Courier">truss</font>will chase down a thread of execution until it has exited. Every system call is shown in the
<font face="Courier">truss</font>output, along with the arguments passed (or at least the first few bytes of them), the return value, and the value of errno, if it was set. Here's an edited
<font face="Courier">truss</font>spiel from an attempt to list a non-existent file:
<font face="Courier">execve("/usr/bin/ls", 0xEFFFFAE0, 0xEFFFFAEC) argc = 2 open("/usr/lib/libintl.so.1", O_RDONLY, 035737754720) = 3 ioctl(1, TIOCGWINSZ, 0x00024C84) Err#22 EINVAL lstat("xyz", 0xEFFFF9A8) Err#2 ENOENT _exit(2) </font>
Note that the process opens up the internationalization library, libintl.so.1, a good hint that it was linked with
<font face="Courier">ls</font>attempts to get the current window size using the TIOCGWINSZ
<font face="Courier">ioctl()</font>, but gets an "invalid argument" because the example was generated on a dial-in line, not a
<font face="Courier">xterm</font>. Searching for the file information on "xyz" returns a "file not found" error, which is printed by
<font face="Courier">ls</font>on its way to a non-zero exit.
Understanding errno isn't purely a serious business. One of the more popular contests at USENIX conferences has been creating new errno names.
Link dink post-shrink
One of the most frustrating exercises performed by system administrators is explaining (calmly) to users why applications that behave routinely on machine A suddenly fail or exhibit strange side effects on machine B. For well-known tools like the C shell, you can wade through .cshrc scripts and find minor environmental differences. But how do you deal with shrink-wrapped code? Use
<font face="Courier">truss</font>to identify the configuration and initialization files opened by the application. On the good machine, grep out the list of files opened and then match it against the same list on the problem machine:
<font face="Courier">huey% fgrep 'open(' truss.out1 > /tmp/out1 huey% fgrep 'open(' truss.out2 > /tmp/out2 huey% diff /tmp/out1 /tmp/out2 </font>
Look for the string "Err#2 ENOENT" signaling a missing file. Double check automounter maps, environment variables, and installation processes that modify files local to each machine, in /etc or /usr/lib, for example. Some applications search for configuration files in several directories, and may find identical files on the two hosts but process them in a different order. Again, checking the sequence of the
<font face="Courier">open()</font>calls and the ENOENT results will tell you if you have a configuration problem.
Use<font face="Courier">truss</font>to identify the
configuration and initialization
files opened by the application.
Also look for EACCESS errors, caused by insufficient file or directory permissions. If the file exists but can't be read by the user, ensure that user and group IDs are consistent between the machines in question. Group-readable files aren't effective unless you enforce group membership on all machines at which users may camp.
Here's a nastier version of the same problem: a user is panicking to set up a demo environment. Rather than create new users and their environments, he runs the demo as root, only to have it fail miserably. Even root gets slapped with EACCESS violations if the files being accessed are NFS mounted. Over the network, root becomes the anonymous user nobody, and relies on world read and execute permissions to open files and search directories. Any application that works for non-privileged users but fails for root is probably opening configuration or data files over NFS. If you suspect that NFS access is contributing to your problem, locate the filesystem for the file in question using
<font face="Courier">huey% df `dirname /usr/lib/gfx/config.common` Filesystem kbytes used avail capacity Mounted on bigboy:/export/home/stern 1952573 944377 812946 54% /home/stern </font>
Watch where you drop direct maps for the automounter, and where you use hierarchical maps that may deposit NFS mounts in the middle of someone's home directory. Applications that rely on making backup copies or renaming input data sets using hard links will fail if an NFS mount is introduced into the middle. For example, assume you are mounting home directories using the following hierarchical automounter map:
<font face="Courier">* \ / homeserv:/export/home/& /fxdata dataserv:/export/datasets/fxdata </font>
When /home/stern is mounted, /home/stern/fxdata is picked up from the machine dataserv. So far, so good. But an application may assume that it can create a hard link between files in /home/stern/fxdata and /home/stern/backup, since they appear to be on the same filesystem. The
<font face="Courier">link()</font>system call fails, however, with EXDEV because the hard link would cross volume boundaries.
Trail of stale crumbs
NFS errors tend to be hard to resolve because you're assigning blame in more than one operating system and host environment. Here are some of the common pitfalls:
"File not found" or ENOENT problems, are almost always a client issue. Automounter maps, /etc/vfstab, or volume manager configuration files should be your starting points. If the client can't even find the file, the client is probably at fault.
EACCESS problems can be caused by inconsistent user and group numbering, or by the root-to-nobody mapping problems described above. While responsibility for presenting valid user and group values falls on the client, you're also bumping into name server and file server issues. Like ENOENT errors, these are reported by the calling process, showing up on the command line or in a dialog box created by the application.
- "NFS write error" or "Stale file handle" errors are server-specific. An error occurred on the server while handling the NFS request, causing it to fail. You'll see these reported in the console window and in the /var/adm/messages log, since the errors are noticed by the kernel's NFS Remote Procedure Call (RPC) code.
NFS errors tend to be hard
to resolve because you're assigning
blame in more than one operatingsystem and host environment.
You have to follow the trail of network crumbs from the client back to the server to resolve server-specific errors. Your first step: get a general feeling for what went wrong using the NFS error number in the console message. NFS uses the standard errno values, so "NFS write error 28" is the same as ENOSPC, namely, the disk is full or the user exceeded his or her disk quota while writing a file. Many NFS errors have obvious explanations: bumping against quotas, filling up a disk, or a disk failure that results in a general I/O error. The more difficult one to chase is a stale file handle.
NFS file handles encode the server's filesystem ID and the file's inode number to uniquely identify each NFS-mounted file. Each inode also contains an inode generation number used to differentiate files that have re-used an inode. Delete a file, for example, /home/stern/summary, and then create a new file, say /home/stern/report on the same filesystem. The new file re-uses the same inode number as the previously deleted file (assuming no other file creation activity snuck in) but increments the inode generation number to distinguish it from the old, removed file.
What if a process on an NFS client machine had /home/stern/summary open when it was deleted on the server or by some other client? NFS has no record of
<font face="Courier">open()</font>activity, so it cannot notify the client that one of its open files has been removed. The next time the client sends a request with the file handle for the "summary" file, the NFS server recognizes that the handle contains an inode generation number that no longer matches the current generation, and it returns a stale file handle error. You'll also end up with stale file handles when the inode is no longer valid, for example, if a file is removed but the newly freed inode has not been re-used.
If you want to watch a network crumble, try restoring an NFS-exported filesystem onto a pristine filesystem without rebooting NFS clients using it. When the new filesystem is created,
<font face="Courier">newfs</font>runs a utility called
<font face="Courier">fsirand</font>to randomly seed the inode generation numbers. During the restore process, files are attached to the first available inode, not necessarily the same inode number they had in the old filesystem. Every client that has an open file handle on the restored filesystem will see stale file handles, since either the inode number or generation number will be mismatched. Clients will hammer away at the network, retrying NFS requests that fail, unable to determine how to fix the stale file handle problems. Your only recourse is to reboot the net-world and let the clients acquire new handles.
How do you associate an NFS error with a client process? First, identify the file in question on the server. In SunOS 4.1.x, the
<font face="Courier">showfh</font>utility takes a file handle and resolves it to a file on the NFS server. However, the RPC daemon used by
<font face="Courier">rpc.showfhd</font>) isn't started by default, and it frequently times out due to the long search time required to find the inode in question. An easier approach is to use a server-side script called
<font face="Courier">fhfind</font>, written by Sun's Brent Callaghan (creator of the automounter), that takes a file handle and locates the file associated with it. For example, let's say that you're seeing:
<font face="Courier">NFS write error 28 on server bigboy 1540002 2 a0000 4f77 48df4455 a0000 2 25d1121d </font>
Error 28 is ENOSPC, so you're out of disk space. Running
<font face="Courier">df</font>on the server verifies that problem. Your job: Get the writing client to ease up so you can clean up. On server bigboy, run
<font face="Courier">fhfind</font>to identify the file represented by the file handle:
<font face="Courier">bigboy# fhfind 1540002 2 a0000 4f77 48df4455 a0000 2 25d1121d /export/home/stern/summary </font>
<font face="Courier">fhfind</font>can take quite a while, particularly for large filesystems, because it does a
<font face="Courier">find</font>on every file to locate the inode number. On the client reporting the error, use the
<font face="Courier">fuser</font>utility to find the process holding this file open:
<font face="Courier">huey# fuser /home/stern/summary /home/stern/summary: 10543o </font>
We can get more detail via the
<font face="Courier">huey# lsof /home/stern/summary COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE/NAME reptool 12582 stern 3r VREG 0x022000a9 158 68376 /home/stern (sugar:/export/home/stern) </font>
<font face="Courier">lsof</font>shows us the file descriptor number used to hold the file open, as well as some information normally included with
<font face="Courier">ps</font>. Look for open files of type VREG in
<font face="Courier">lsof</font>'s output, noting that these are regular open files. Entries marked with a type of VDIR are current directories, and are probably not the source of your problem.
There is a drawback to this approach: stale file handles can't be found using the
<font face="Courier">fhfind</font>script. Inodes associated with stale file handles either aren't valid, and therefore can't be found by searching the filesystem, or have been re-used by a new file, possibly with a different name. In this case the best tactic is to narrow down the process candidates using
<font face="Courier">lsof</font>to find those with NFS files open:
<font face="Courier">huey# lsof -N | fgrep VREG </font>
Look for file descriptors (in the FD column) with a
<font face="Courier">w</font>in them, indicating the file has been opened for writing. You don't really need the filename for the stale file handle; it may not even exist at this point. Just take the inode numbers reported by
<font face="Courier">lsof</font>and match them against the inode numbers pulled from the stale file handle error messages on the console. Use this script to convert a file handle into a server inode number:
<font face="Courier">#! /bin/sh # fh2inode - convert NFS file handle to inode fh=`echo $4 | tr [a-z] [A-Z]` echo "ibase=16;$fh" | bc </font>
If the server exports more than one filesystem, you'll need to find the volume associated with the stale handle. The first value in the file handle is a filesystem ID; match it to the mounted filesystem ID values in /etc/mnttab to locate the volume on which you're experiencing an error.
As soon as you've found the process writing to a stale file handle, clean up gently by polling the user, then killing or restarting the process.
close() to the edge
Detecting errors while writing to a file is complex for both NFS and local filesystems. Unix does asynchronous writes, that is, the writes are stacked up by the operating system and flushed out periodically. On local disks, the
<font face="Courier">update</font>daemon runs every 30 seconds to force pending writes to disk. With NFS, the kernel threads (Solaris 2.x) or biod processes (SunOS 4.1.x) queue writes locally. What happens if an error occurs during the completion of one of the
<font face="Courier">write()</font>system calls? In short, the error is reported back on the next
<font face="Courier">write()</font>system call or on the call to
<font face="Courier">close()</font>. You're guaranteed to see any errors by the time
<font face="Courier">close()</font>returns, because all pending writes are flushed (converted to synchronous writes) when the file is closed.
How often does your code check the return value from
<font face="Courier">close()</font>? Again, this is an issue for local disks and NFS filesystems, although you're more likely to see problems with NFS since most error checking is done by the server, after the request has been buffered and subsequently flushed by the client. If you fill a disk or exceed a quota, you run the risk of having an NFS write fail undetected unless you check return and errno values from
The moral of the story is religious enforcement of standards for system programming, paying particular attention to error checking. If you want to become the patron saint of errno, here are some guiding principles:
Explicitly clear errno before a system call for which you'll check errno. The value of errno is not set to zero after a successful call, so you may end up testing a value set several system calls earlier. Limit the window to which you're exposed to side effects by keeping the system call and errno reset as close as possible. If you intersperse library calls, you may be inadvertently setting errno via a system call made from the library routine. When errno tests produce unexpected results, use
<font face="Courier">truss</font>to make sure that the value you're testing was produced by the system call immediately preceding the test.
Beware of interactions between system tuning efforts and errors. If you change the TCP keepalive timer interval in an effort to get socket addresses re-used more quickly, you'll eliminate EADDRINUSE errors but will generate more network traffic for the keepalive probes.
- Remember that the whole world doesn't speak English, and link
<font face="Courier">-lintl</font>. The
<font face="Courier">perror()</font>library routine looks up error messages in an internationalized library, accessed if libintl.so is linked in. If your system or application sets its locale, you can see Unix error messages in French, German, Italian, or other languages. (And you thought the opera reference was a non sequitur.)
Rigorous adherence to good system programming practices prevents odd failures due to unexpected input or output conditions. Nobody plans for their code to handle disk overflows, but these deficiencies become clear at the worst possible moment, when the system -- and you -- are under maximum stress. Unresolved error conditions are the ones that cause loss of data, jobs, or your sanity. Keep your users abreast of your system style guide, and you might just have time to appreciate those great operatic moments.