Get better results when you design your cache to match your applications and system

Unix Insider |  Operating Systems

The 1 megabyte cache
choice is always faster, but it will have a proportionately greater
benefit on performance and scalability of a 24 CPU E6000 than on a 4
CPU E3000.


An anomaly can arise in some cases. It takes time to search the
cache and to load new data into it, which slows down the fetch
operation. If you normally operate in a cache-busting mode you actually
go faster if there is no cache at all. Since caches add to the cost of
a system (extra memory used or extra hardware), they may not be present
on a simpler or cheaper system. The anomaly is that a cheaper system
may be faster than a more expensive one for some kinds of work.


The microSPARC processor used in the SPARCstation 5 is very simple
-- it has a high-cache miss rate but a low-cache miss cost. The
SuperSPARC processor used in the SPARCstation 20 has much bigger
caches, a far lower cache miss rate, but a much higher cache miss cost.
For a commercial database workload, the large cache works well, and the
SPARCstation 20 is much faster. For a large Verilog EDA simulation, the
caches are all too small to help, and the low latency to memory makes
the SPARCstation 5 an extremely fast machine.


The lessons learned from this experience were incorporated in the
UltraSPARC design, which also has very low latency to memory. To get
the low latency, the latest, fastest cache memory and very fast wide
system buses are needed. This is one reason why UltraSPARC caches, at 1
megabyte, currently are smaller than some other systems that have
larger slower caches and a higher cache-miss cost.


If we consider our other examples, the DNLC is caching a filename.
If the system cannot find out the corresponding inode number from the
DNLC, it has to read through the directory structures of the
filesystem. This may involve a linear search through a UFS directory
structure, or a series of readdir calls over NFS to an NFS server.
There is some additional caching of blocks of directory data and NFS
attributes that may save time, but often the search has to sleep for
many milliseconds waiting for several disk reads or network requests to
complete.


You can monitor the number of directory blocks read per second using
sar -a as dirbk/s, plus you can also look at
the number of NFS2 readdir calls and NFS3
readdirplus calls. NFS2 reads a single entry with each
readdir call, while NFS3 adds the readdirplus
call that reads a series of entries in one go for greater efficiency
(but longer latency -- more next month on NFS3).

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question