The 1 megabyte cache
choice is always faster, but it will have a proportionately greater
benefit on performance and scalability of a 24 CPU E6000 than on a 4
An anomaly can arise in some cases. It takes time to search the
cache and to load new data into it, which slows down the fetch
operation. If you normally operate in a cache-busting mode you actually
go faster if there is no cache at all. Since caches add to the cost of
a system (extra memory used or extra hardware), they may not be present
on a simpler or cheaper system. The anomaly is that a cheaper system
may be faster than a more expensive one for some kinds of work.
The microSPARC processor used in the SPARCstation 5 is very simple
-- it has a high-cache miss rate but a low-cache miss cost. The
SuperSPARC processor used in the SPARCstation 20 has much bigger
caches, a far lower cache miss rate, but a much higher cache miss cost.
For a commercial database workload, the large cache works well, and the
SPARCstation 20 is much faster. For a large Verilog EDA simulation, the
caches are all too small to help, and the low latency to memory makes
the SPARCstation 5 an extremely fast machine.
The lessons learned from this experience were incorporated in the
UltraSPARC design, which also has very low latency to memory. To get
the low latency, the latest, fastest cache memory and very fast wide
system buses are needed. This is one reason why UltraSPARC caches, at 1
megabyte, currently are smaller than some other systems that have
larger slower caches and a higher cache-miss cost.
If we consider our other examples, the DNLC is caching a filename.
If the system cannot find out the corresponding inode number from the
DNLC, it has to read through the directory structures of the
filesystem. This may involve a linear search through a UFS directory
structure, or a series of
readdir calls over NFS to an NFS server.
There is some additional caching of blocks of directory data and NFS
attributes that may save time, but often the search has to sleep for
many milliseconds waiting for several disk reads or network requests to
You can monitor the number of directory blocks read per second using
sar -a as
dirbk/s, plus you can also look at
the number of NFS2
readdir calls and NFS3
readdirplus calls. NFS2 reads a single entry with each
readdir call, while NFS3 adds the
call that reads a series of entries in one go for greater efficiency
(but longer latency -- more next month on NFS3).