Journaling, Part 2

Following last week's discussion about journaling file systems, today I

will explore the inner workings of journaling file systems and present

available products. Before I explain how journaling file systems work,

let's review the vulnerabilities of traditional static file systems,

such as ext2fs.

Under a static file system, each file consists of two logical units: a

metadata block (commonly known as the inode) and the file's data. The

inode (information node) contains information about the physical

locations of the file's data blocks, modification time, etc.... The

second logical unit consists of one or more blocks of data, which

needn't be contiguous. Thus, when an application changes the contents

of a file, ext2fs modifies the file's inode and its data in two

distinct, synchronous write operations. If an outage occurs in between,

then the file system's state is unknown and needs to be checked for

consistency. A metadata logging file system overcomes this

vulnerability by using a wrap-around, appending only log area on the


The logging system records the state of each disk transaction in the

log area. Before any change is applied to the file system, an intent-to-

commit record is appended to the log. When the change has been

completed, the log entry is marked as complete. In the event of a

recovery from a failure, the system replays the log and checks for an

intent-to-commit record without a matching completion mark. Since every

modification to the file system is recorded in the log, the file system

only needs to read the log rather than performing a full file system

scan. If an intent-to-commit record without a completion mark is found,

then the change logged in that record is undone.

Let's look at a concrete example. Suppose we have a file that contains

three data blocks: 1,2 and 3. The first two of blocks are contiguous:


The b area indicates discarded data blocks and H is the file header.

Now an application updates blocks 2 and 3. Consequently, the file

system looks as follows (the a area marks obsolete data blocks that

previously contained the blocks 2 and 3 and the header):


Notice that the modified data was appended to the end: first, the

blocks 2 and 3, and finally the header. The previous location of blocks

2,3, and the header was discarded. This approach has several

advantages. It's faster because the system doesn't need to seek all

over the disk for writing parts of the file and it's safer because file

parts that have been changed aren't lost until the log has successfully

written the new blocks. Finally, a recovery after a crash is much

faster because the logging system needs to check only the updates that

took place after the last checkpoint.

At present, there are several journaling file systems available for

Linux. The SGI xfs file system is an Open Source product. It's a

reliable, fast, and 64-bit file system. IBM's JFS is another highly

acclaimed open source product. Its 1.0.0 version was released recently.

For further information on JFS see:

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon