Get better results when you design your cache to match your applications and system

Unix Insider |  Operating Systems

For small on-chip caches, there may be
"n-way associative mapping" where there are "n" choices of location for
each cache line, usually two or four, and a LRU or random choice
implemented in hardware. Random replacement policies seem to offend
some of the engineering purists among us, as they can never work
optimally, but I like them as the converse is true, and they rarely
work poorly either!


Caches work well if the access pattern is somewhat random with some
locality, or if it has been carefully constructed to work with a
particular cache policy (an example is SunSoft WorkShop's highly tuned
math library). Unfortunately many workloads have structured access
patterns that work against the normal policies. Random policies can be
fast, don't need any extra storage to remember the usage history, and
can avoid some nasty interactions.


Cache writes and purges

So far we have only looked at what is involved in reading existing
information through a cache. When new information is created or old
information is overwritten or deleted, there are extra problems. The
first issue to consider is that a memory-based cache contains a copy of
some information, and the official master copy is kept somewhere else.


When we want to change its value, we have several choices. We could
update the copy in the cache only, which is very fast, but it now
differs from the master copy. We could update both copies immediately,
which may be slow, or we could throw away the cached copy and just
update the master (if the cache-write cost is high). Another issue to
consider is that the cache may contain a large block of information, and
we only want to update a small part of it. Do we have to copy the whole
block back to update the master copy? And what if there are several
caches for the same data, as in a CPU with several levels of cache and
other CPUs in a multiprocessor?


There are too many possibilities, so I'll just look at the examples
we have discussed already. The CPU cache is optimized for speed, using
hardware to implement relatively simple functions with a lot of the
operations overlapped so they all happen at once. On an UltraSPARC, a
cache block contains 64 bytes of data. A write will change from 1 byte
to 8 bytes at a time. Each block contains some extra flags that
indicate whether the data is shared with another CPU cache, and if the
data has been rewritten.


A write updates both the internal level 1 and the external level 2
caches. It is not written out to memory, but the first time a shared
block is written to, a special signal is sent to all the other CPUs in
a multiprocessor system that invalidates any copies of the data that
they hold. The flags are then changed from shared/clean to
private/dirty.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question