Why aren't you logging?

Unix Insider –

Solaris 7 and 8 include a native implementation of file system journaling. That feature, known as intent logging or just logging enables faster file system operation and faster system boot. It's trivial to implement, safe to use, and free. Clearly, you should be using it.

The new logging feature is an option to the Unix File System (UFS), which is the standard file system for all disk partitions on Sun machines, except for partitions holding swap space. Logging is enabled on a per file system basis, and it can even be enabled on / and other operating system partitions.


Solaris UFS logging works by allocating space from the file system's free blocks. Within that space, all metadata changes to the file system are written. Metadata includes directory and i-node information but not file data blocks, essentially everything but the actual data within the file. So, for example, a file create modifies the directory structure and allocates a new i-node, and those activities are written to the logging space. Once the metadata changes are made to the logging area, the system is free to perform other operations to the file system. In the background, the information in the log is flushed to the file system and updates the appropriate directory and i-node structures, completing the file system operations.

The logging data is written sequentially within the log space. It's therefore much faster for the operating system to complete metadata changes via logging and background flushing than by directly modifying the metadata (via random I/O) spread across the disk. The size of the logging space is based on the size of the file system, and equals 1 MB per 1 GB of file system space, up to 64 MB. The space is used as a circular log: if the log space is about to fill up, new metadata change requests are paused while the log is emptied. As changes are moved from the log to the file system, that log space is made available, and new metadata changes can be written to the logging space.

Usually with UFS, if the system crashes during any file system operation, the entire system must have its consistency checked via the

<font face="Courier">fsck</font>
command. That command can take several minutes per file system because it checks all metadata and file data to ensure the structures are correct, free, and used, and that the i-node block counts are correct. It also confirms that the free space available is current, repairs inconsistencies, and occasionally requires manual intervention to fix large problems. Files and even directories can be lost, depending on the operations occurring at the time of the crash.

Because metadata changes are made first to the log space rather than to the file system, the consistency check for a logged file system after a crash is a simple and fast operation. The system evaluates the logging data and determines which changes had completed against the underlying file system, which had yet to start, and which were in progress. Those completed or not yet started are removed from the log, and those partly completed are either undone or completed. If there's sufficient data in the log to complete the operation, it's completed. Otherwise, the changes made are removed from the underlying file system.

Those familiar with database operation will recognize the similarity between database transaction processing and those activities. The end result is that the underlying file system is consistent, and no thorough consistency checking is needed. That operation completes in a few seconds per file system.

Using logging

Starting with Solaris 7, there's a new logging option to the

<font face="Courier">mount</font>
command and in the
<font face="Courier">/etc/vfstab</font>
system configuration file. Logging only appears in a couple other places within Solaris. The
<font face="Courier">mount</font>
command shows which partitions are mounted and lists logging in the options fields for each partition on which logging is enabled. Finally, at system boot time, the
<font face="Courier">fsck</font>
phase reports per partition whether each is stable, logging, or being checked. There are no other status commands available to determine the state of logging.

But does it work?

As with any new feature from an operating systems vendor, a bit of skepticism and try-before-you-buy attitude is helpful. To test the performance and functionality of UFS logging, we used an Ultra 10 running Solaris 8. A quick script to do 1,000 directory creations put plenty of pressure on the metadata handling of UFS:

<font face="Courier">
# more logging-test
# Test ufs logging performance

# Get to the right place and make a test directory
cd /big
mkdir test

# Create 1000 directories
integer i=0
while ((i < 1000))
        mkdir test/test${i}

The first test was done with logging disabled (the Solaris 7 and 8 default):

<font face="Courier">
# time ./logging-test

real       27.7
user        2.0
sys         4.9

# time rm -rf test

real        8.0
user        0.0
sys         0.3

The result was 35.7 seconds for the directory create and delete. Next, logging was enabled and the same test was run:

<font face="Courier">
# umount /big
# mount -o logging /big

# time ./logging-test

real       13.2
user        1.7
sys         5.4

# time rm -rf test

real        0.8
user        0.0
sys         0.2

The same commands with UFS logging enabled took 14 seconds to complete, a significant savings.

The next test was even more fun. With no logging, I power-cycled the Ultra 10 during the directory create script.

The result was not pretty. At boot, the system went into single-user mode and a manual fsck was required to repair the damage to the file system on

<font face="Courier">/big</font>
. After typing in many y responses to the various questions, the file system was again consistent and could be booted.

Trying the same test with logging enabled resulted in a pleasant and quick

<font face="Courier">
/dev/c0t1d0s2: is logging

message during

<font face="Courier">fsck</font>
, and the system booted without any issues.

There were some early bugs with UFS logging, so be sure to have the latest kernel patch cluster installed on your systems before enabling logging. It's also wise to test that feature in a QA lab or in another nonproduction environment.

Logging increases performance, decreases

<font face="Courier">fsck</font>
time, removes the risk of a file system corruption, can be used on all UFS partitions (including root), and is free. It obviously receives a two thumbs, way up rating.

Another mount option

There's another relatively new and useful option you should be aware of. It's the noatime mount option.

Without noatime, every time a file is opened for read, its access time i-node value is updated. That is useful for listing the date and time the file was last read, as displayed with the

<font face="Courier">ls -u</font>

There are many instances in which either the last access time is not interesting, keeping it up to date is too much overhead, or both. Web server contents (static text pages, for instance) and Usenet news directories are two good examples. In those instances, the overhead of performing one i-node write for every file open is quite heavy.

The noatime mount option decreases the frequency of access time update. Essentially, it tells the system to only update the access time if another update to the i-node is being done coincidently. No harm is done, especially on file systems where there's no interest in the last access time information.

Useful information from Sun

Sunsolve has two interesting articles worth looking at. One is about performing diagnostics on Sun hardware, and it contains information about Open Boot Prom commands, variables, and status information conveyed by the system and board LEDS. It also points to information about debugging hardware problems.

The other article discusses the new Solaris 8 memory architecture. It describes the new cyclic file system page cache, and considers why priority paging is not implemented in Solaris 8.

Insider: How the basic tech behind the Internet works
Join the discussion
Be the first to comment on this article. Our Commenting Policies