Raw Disk I/O

System-based I/O vs. Raw I/O

Before version 2.2, the Linux kernel supported only one type of disk

I/O -- system-based I/O in which a process interacts with a physical

device through an intermediary kernel buffer. This intermediary buffer

is transparent to the user: calls to read() write() lseek() etc., seem

as if they access a physical file directly. In practice, however, the

kernel intercepts the calls and transfers the data to its own buffer

before passing it on to the physical device or process. System-based

I/O offers several advantages. The kernel can cache data and thus

reduce the overall physical I/O operations needed; the kernel controls

the overall system performance because it won't allow an on-the-loose

process to starve other processes. Finally, certain devices are fussy

about the size of data being written or read (for instance, disk drives

usually handle fixed size chunks of 512 bytes on each I/O operation).

The kernel hides these physical limitations from the user. In raw disk

I/O, the process interacts with a physical device directly, without the

kernel's brokerage.

Uses of raw disk I/O

Although the traditional system-based I/O is satisfactory in most

cases, some applications must use raw I/O. One common scenario is in

data-critical applications, where the user wants to ensure that the

data is written to a disk immediately so that it isn't lost in the

event of a system failure. Specialized applications, say a relational

database engine, often use their own I/O caching algorithms. In such

applications, the overhead of the kernel's caching is uncalled for.

Raw I/O Past, Present, and Future

The concept of raw disk I/O is hardly new. Several attempts have been

made in the past to introduce raw I/O to Unix. Indeed, quite a few

variants such as IBM's AIX now support it. The problem, however, is

that most existing implementations require literally doubling the

number of device nodes. Linux creators rejected this approach. Instead,

kernel 2.4 uses a pool of device nodes that can be associated with any

arbitrary block device. A new object called "kiobuf" was introduced. A

kiobuf is an abstraction of a set of kernel pages set up during boot

time. Raw I/O is achieved by creating a kiobuf and populating it with

the physical pages a given process is using for I/O, without any

intermediate copies.

The kiobuf object is then passed to the I/O layers for reading and

writing. The current Linux implementation isn't perfect yet but it's

constantly improving. For further information, you can subscribe to the

kiobuf-io-devel list server or read prior postings here:


ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon