Advanced I/O Techniques, Part 1

This week I will introduce the principles of I/O multiplexing. This

introduction will serve as the basis for our next week discussion about

the select() syscall.

Fast and Slow Files

I/O operations on normal files always block. This means that once you

call read() or write(), the process waits until the function returns.

When dealing with disk files, this isn't a problem because these files

are stored on a local disk and the execution time of the syscalls that

access them is more or less predictable. Yet certain file types have

unpredictable completion times. For example, reading from a pipe that

doesn't have any data in it will block until data becomes available; in

the meantime, the process remains blocked. Files that may take an

indeterminate amount of time to complete an I/O operation are called

"slow files".

I/O Multiplexing

Things get more complicated when simultaneously dealing with multiple

file descriptors. Consider a Web server process that is constantly

polling 200 client connections, each of which send requests to the

server. A naive implementation of this server would look as follows:

while(true)

{

for (int i=0; i<200; i++)

{

read(file_descriptors[i], buff, buffsize);

/*...process the data*/

}

}

Alas, if a file descriptor doesn't contain data, the loop will block

until data becomes available on that file, regardless of the remaining

199 clients. Obviously, this is a very bad idea.

Enter Nonblocking I/O

Using nonblocking I/O to access slow files would undoubtedly improve

matters. The fcntl() syscall enables you to open a slow file in a

nonblocking mode. When a slow file is nonblocking, read() always returns

immediately; if no data is available, it simply returns 0.

Still, this isn't a perfect solution. The problem with polling

nonblocking file descriptors is that the program never blocks! It

continually executes the loop, thereby inflicting a significant

performance penalty. We really want the kernel to notify our process

when data is available on one or more file descriptors. When no data is

available, the process should block and, thus, avoid wasting system

resources in vain.

Next week, we will see exactly how to do that.

From CIO: 8 Free Online Courses to Grow Your Tech Skills
Join the discussion
Be the first to comment on this article. Our Commenting Policies