Time bombs

Unix Insider –

About ten years ago, Dennis Ritchie posted the question "What time is it?" on net.general, then the catch-all for USENET traffic of interest to the masses. Dennis was poking fun at messages (and their authors) that arrived several days after bearing any sense of relevance. The "what time is it?" probe demonstrated that USENET isn't a good source for a watch-setting consensus. The underlying problem, however, is not setting just one clock on one machine, but instead keeping an entire network of machines synchronized with respect to a global, reliable time source and with respect to each other.

Accurate time-keeping affects many day-to-day functions as well as issues critical to system administrators:

  • File modification times are consulted by
    <font face="Courier">make</font>
    , so skewed desktop clocks may result in erratic actions.
  • Secure RPC, NIS+, and Secure NFS rely on timestamps for verifying requests. Client clocks that stray into network tardiness result in rejected requests.
  • Digital signatures or transaction timestamps that are used to sequence events must refer to a time base that is not machine dependent.
  • Attempts to correlate network activity, performance data, and system log messages from multiple hosts require a single, global timepiece for the network. You can't track a performance problem across several machines if you can't lay out events on a single, absolute timeline.

Put simply: how do you keep the software watches of hundreds or thousands of systems synchronized, particularly when they are spread around the world, and how do you make sure that your sense of time matches that of the rest of the computing world? This month, we address the question of what to do when your sense of time bombs. We'll start with an overview of how Unix systems keep track of time, We'll look a simple method for synchronizing Unix system clocks, and take a peek at the Network Time Protocol, a small dose of Swiss perfection you can bring to most Unix and Windows NT machines.

A brief history of time, in 64 bits or less

Time is suddenly in vogue for technical discussion. Impending software meltdowns due to the two-digit year rollover in 2000, and estimates of the amount of work required to fix every picayune date reference have displaced Internet mania on the covers of our trade press. The hysteria has clouded the issue slightly, since the problem with date rollover only exists if your systems do math on two-digit year values. If you designed a database with the year ranging from 00-99, you will have saved a few bytes per row but created a software headache for someone else. Lotus 1-2-3, Microsoft Excel, and other products skirt the 1999-party-over problem by using 100 to refer to 2000, 101 for 2001, and so on, extending date fields to three digits when the millennium arrives.

On Unix systems, the current time is decoupled from any particular year or representation. The time is stored as a 64-bit value representing the number of seconds elapsed since January 1, 1970. The year 2000 isn't a problem, but sometime in January 2037 the Unix time counter will roll over and cause another trade press crisis. The time counter is relative to Greenwich Mean Time (or Universal Mean Time) and is converted into local time using the information in the timezone file /etc/timezone. Choosing UMT as a baseline means that (in theory) all Unix systems should have exactly the same 64-bit time counter at any point in time. We'll see how that simplifies time management protocols later on.

Accurate time keeping affects nearly all of the file-based operations on a Unix system. Every Unix file has its modification and access times recorded in the inode. The modification time is used by backup utilities, NFS, and the local virtual memory management system for maintaining consistency of in-memory file cache pages. Note that no creation times are not kept in the filesystem. If you want an audit trail of when a file was created and modified, you'll need to use a source code control system such as SCCS that tracks each change.

Given that system clocks tend to drift apart, NFS client systems compensate for "impossible" clock values with slight modifications to basic commands like

<font face="Courier">ls</font>
. Normally,
<font face="Courier">ls</font>
shows you the day and modification time of a file if it is less than six months old, otherwise it shows you the month, day and year:

<font face="Courier">
duey% ls -l 
total 38
-rw-r--r--   1 stern    user      18631 Apr 15  1995 time.txt

What if you create a file on a machine whose clock is slightly ahead? You'll confuse

<font face="Courier">ls</font>
since it's subtraction of modification time from current time yields a negative number. The NFS-aware
<font face="Courier">ls</font>
accepts a clock drift window of a few minutes, and does the visually right thing with such files.

If you're bothered by inconsistent file modification times, you can always explicitly set them using

<font face="Courier">touch</font>
(or perhaps
<font face="Courier">/usr/5bin/touch</font>
to access the System V version). To find all of the files created since a certain time, use
<font face="Courier">find</font>
and a timestamp file. The following example creates a timestamp file dated April 15, 1996, 2:15 PM, and then prints out files in /home/stern that have modification times more recent than the timestamp:

<font face="Courier">
duey% touch 04151445 /tmp/timestamp 
duey% find /home/stern -newer /tmp/timestamp -print

If you're using a Secure RPC service, such as Secure NFS or NIS+, you also need solid time management to ensure that timestamp verifiers generated by a client are accepted by the server. The Secure RPC client encrypts the current time and passes it to the server, where it is decrypted to look for out-of-order or old requests that might signify a replay of a previous request. If you get messages like "auth-def validator mismatch" or "NIS+ received an invalid time stamp," your client clock has drifted too far behind the server's, and the server is rejecting Secure RPC requests.

As an increasing number of network services depend on timestamps, verification, and sequencing, it's a good idea to get your house's clocks in order. We'll start by going all the way down to the interrupt level to see how Unix tracks time.

Dali would be proud: Driving the soft clocks

When you set the time using

<font face="Courier">date</font>
, you're really setting an on-board, battery-backed hardware clock. You've probably noticed that your Sun systems retain their sense of time, even when powered off or between reboots, since the hardware clock keeps ticking even when you're not clicking. In an ideal world, you would set your system's clock once when you installed it and forget about it, letting the Unix kernel read its on-board watch to keep track of time.

The real world is less forgiving: your initial setting can only be as accurate as your own watch or time source, so with multiple administrators or multiple sources, you generally get slightly skewed system clocks. The hardware clock will exhibit some natural drift, but at a second or two per month it tends to be lost in the noise of variations in wristwatch settings used to set the time. What you need to worry about, however, is drift between the software clock and the hardware clock caused by system load. Despite this long wind-up on the virtues of a hardware clock, there's some software involved that keeps the user- and system-visible sense of time.

All Unix systems are driven by a hardware clock (hardclock) that interrupts at regular intervals. In the case of Sun's SPARC-based systems, the hardclock runs at 100-Hz, generating 100 interrupts a second. Hardclocks act as a system heartbeat, providing a steady drum beat to drive the scheduler and implement timeouts for system operations such as TCP transmissions and RPC requests. Note that the hardclock and the hardware clock are two different things: the hardclock is a source of constant interruption, and the hardware clock is a built-in I/O device that can be set and read.

On the surface, getting the system time should be as easy as reading the hardware clock. However, the current system time is used throughout the filesystem and virtual memory code for comparing timestamps, so it has to be accessible with minimum latency and overhead. Reading the hardware clock is much less efficient, and much slower, than reading a value out of memory. To avoid constantly referring to the hardware clock, the Unix kernel uses the hardclock to drive a software clock. That is, at boot time, the kernel reads the current version of the hardware clock into a 64-bit chunk of memory. In the interrupt handler for the hardclock, the softclock is incremented by 10 milliseconds. Voila -- parallel software and hardware clocks. Unfortunately, the software timepiece drifts quite frequently.

While your sense of time is important, other system events take precedence over the hardclock. Incoming characters on a serial line, for example, may not wait around while the system processes events on a timeout queue. The serial device needs to be read and reset as quickly as possible to avoid dropping input characters. The virtual memory system also masks hardclock interrupts while it updates memory management unit entries, or manipulates address space kernel structures. Put a high serial I/O load on the system, or thrash your address spaces, and you're going to miss hardclock interrupts. When you miss a hardclock, the software clock falls behind. Miss enough, and your Unix system clock appears to be running slowly.

The softclock has a built-in adjustment mechanism. Periodically, it checks the value of the hardware clock, and determines if it needs to catch up. If the software clock is behind, it will slowly adjust itself by adding two or more "ticks" per hardclock until hardware and software are synchronized again. The incremental adjustment is known as the clock slew rate. The software clock catches up slowly to avoid abrupt changes in the user-visible time. If you are consistently missing hardclocks, the slow-sync approach may never get you caught up; you'll fall behind faster than the slew rate makes up for lost ticks. If this is a problem, you'll see two obvious symptoms. First, your system's clock will be losing time, and second, the clock interrupt rate shown by

<font face="Courier">vmstat -i</font>
will be less than 100 per second:

<font face="Courier">
duey% vmstat -i
interrupt         total     rate
clock          95738328       99
fdc0             510691        0
Total          96249019       99

The clock interrupts are not counted in the generic output of

<font face="Courier">vmstat</font>
, but appear in the interrupt breakdown shown above. If you see the interrupt rate drop to 99 per second or lower, it's time to lighten your system load or use some aggressive clock management. We'll look at a simple
<font face="Courier">rdate</font>
-based scheme and then cover the Network Time Protocol, a finer-grain and gentler time management system.

"What Time Is It?" revisited

Included in the bevy of Berkeley r-commands is

<font face="Courier">rdate</font>
, the equivalent of the
<font face="Courier">date</font>
time-setting command that takes its input from another host on the network, in this case the machine timepiece:

<font face="Courier">
duey# rdate timepiece

The result will be system clocks that are synchronized to a common source, and accurate within a few seconds. Many system administrators stick an

<font face="Courier">rdate</font>
command in the crontab file, forcing clients to synchronize once a day or even once an hour with a well-known server. If you do go the
<font face="Courier">rdate</font>
route, make sure you redirect all of your output streams using a crontab entry like the following:

<font face="Courier">
0 * * * *  rdate timepiece > /dev/null 2>&1

If you don't catch stdout and stderr,

<font face="Courier">rdate</font>
's confirmations will show up in root's mailbox, courtesy of
<font face="Courier">cron</font>
. A simple approach to
<font face="Courier">rdate</font>
synchronization is to have each NFS client talk to one of its NFS servers, and have the servers talk to a common, local time source. You can replicate this setup across multiple LANs, where cross-network time synchronization may not be as much of an issue if you aren't sharing PGP-signed messages, NFS mounted filesystems or NIS+ servers.

1 2 Page
What’s wrong? The new clean desk test
You Might Like
Join the discussion
Be the first to comment on this article. Our Commenting Policies