Journaling and Logging

The traditional Linux file systems were based on the legacy Unix file

systems. Such file systems (e.g. ext2fs) are static, which means they

do not track changes applied to files and directories to guarantee that

all updates are performed safely. Furthermore, ext2fs works

asynchronously. Information about a file -- for example its

permissions, creation date, and ownership -- are written in a delayed

fashion and, often, in several distinct operations.

This approach results in a noticeable performance gain; however, it

also incurs data consistency problems. If a power failure occurs

exactly when the file system has updated the contents of a file but

before it managed to update its header, then the file becomes

corrupted. Worse yet, if the disk is highly fragmented, then it's

likely that other files may have been corrupted as a result and the

entire directory needs to be restored.

Traditionally, a process called fsck (file system check) would check

the file system during reboot and detect the corrupt files. In some

cases, it would manage to fix them too, but usually you would have to

reconstitute the files from a backup set. In the Internet age, when

servers are required to stay up for months, this approach is

unacceptable. The demand for a more reliable file system and faster

recovery time led to the development of several journaling and logging

file systems.

What is journaling?

The concept, introduced about a decade ago in database systems, ensures

data consistency and integrity in the event of a failure during a

transaction. A typical database journaling system records every

operation applied to the database records. If a transaction can't be

completed due to a hardware fault or a network failure, then the

database system restores the records to their original state. A

journaling file system uses a similar method by constantly monitoring

inode changes.

Logging, as opposed to journaling, keeps track of both inode changes

and file content changes. Each of these approaches has advantages and

drawbacks. In terms of performance overhead, journaling requires less

resources but logging enables faster recovery time. In either case,

recovery time is much faster compared to a static file system.

Furthermore, it doesn't necessitate a reboot.

Next week, I will explore this issue in further detail and present some

of the available journaling file systems for Linux.

Insider: How the basic tech behind the Internet works
Join the discussion
Be the first to comment on this article. Our Commenting Policies