NUMA

SMP in a Nutshell

In a previous newsletter (http://www.itworld.com/nl/lnx_tip/03302001),

I presented the principles of symmetric multiprocessing (SMP). An SMP

system combines multiple processors that operate under a single

operating system and access each other's memory over a common bus.

However, SMP's scalability is rather limited; once the system includes

more than 16 processors, performance usually deteriorates. The problem

lies with the throughput of the shared bus that connects the processors

to memory devices. As the number of processors increases, the bus

becomes saturated and turns into a performance bottleneck.

Enter NUMA

NUMA is a relatively new method of configuring a cluster of processors

in a multiprocessor system so that they share memory locally thereby

improving performance and users' ability to expand the system beyond

the inherent limits of SMP. NUMA adds an intermediate level of memory

shared among a few processors so that most data accesses don't have to

travel on the main bus. NUMA defines three cache layers, where a lower

number indicates a faster cache: L1, L2 and L3. When a processor looks

for data, it first looks in the L1 cache on the processor itself (MMX

processes, for instance, have a private 32KB cache each), then on a

larger L2 cache chip nearby, then on the L3 cache that NUMA provides.

Only if all the previous lookups have failed does the processor seek

the data in the external memory, which is significantly slower. Put

differently, NUMA introduces an additional cache layer that reduces the

number of accesses to the external memory.

NUMA-enabled SMP

A typical NUMA-based machine consists of multiple clusters, or units.

Each unit consists of four processors interconnected by a local bus to

a shared memory (the L3 cache) on a single motherboard. A common SMP

bus interconnects several units thus forming an SMP system. Such a

system may contain up to 256 processors. NUMA views each of these units

as a node in the interconnection network. However, a user-level

application views all the individual cluster's memories as a single

memory.

For further information about NUMA see:

http://www.zdnet.com/computershopper/edit/cshopper/content/9706/cshp0042

.html

Insider: How the basic tech behind the Internet works
Join the discussion
Be the first to comment on this article. Our Commenting Policies