Consider a naive file reading benchmark that opens a small file,
then reads it to "see how fast the disk goes." If the file was recently
created, then all of the file may be in memory. Otherwise, the first
read through will load it into memory. Subsequent runs may be fully
cached with a 100 percent hit rate and no page ins from disk at all.
The benchmark ends up measuring memory speed, not disk speed. The best
way to make the benchmark measure disk speed is to invalidate the cache
entries by unmounting and remounting the filesystem between each run
of the test.
The complexities of the entire virtual memory system and paging
algorithm are beyond the scope of this article. The key thing to
understand is that data is only evicted from the cache if the free
memory list gets too small. The data that is evicted is any page that
has not been referenced recently -- where recently can mean a few
seconds to a few minutes. Page-out operations occur whenever data is
reclaimed for the free list due to a memory shortage. Page outs occur
to all filesystems but are often concentrated on the swap space.
Disk array units, such as Sun's SPARCstorage Array or hardware RAID
subsystems from other vendors, contain their own cache RAM. This cache
is so small in comparison to the amount of disk space in the array,
that it is not very useful as a read cache. If there is a lot of data
to read and reread, it would be better to add large amounts of RAM to
the main system than to add it to the disk subsystem. The in-memory
page cache is a faster and more useful place to cache data.
A common setup is to make reads bypass the disk array cache and to
save all the space to speed up writes. If there is a lot of idle time
and memory in the array, then the array controller might also look for
sequential read patterns and prefetch some read data. In a busy
array, however, this can get in the way. The OS does its own
prefetching in any case.
There are three main situations that are helped by the write cache.
When a lot of data is being written to a single file, it is often sent
to the disk array in small blocks, perhaps 2 kilobytes to 8 kilobytes
in size. The array can use its cache to coalesce adjacent blocks, which
means that the disk gets fewer larger writes to handle. The reduction
in the number of seeks greatly increases performance and cuts service
times dramatically. This operation is only safe if the cache has
battery backup for its cache (nonvolatile RAM), as the operating
system assumes that when a write completes, the data is safely on the
disk. As an example, 2-kilobyte raw writes during a database load can
go two to three times faster.