August 05, 2003, 12:00 AM — Imagine a very, very large corn field. Think about a field the size of
Texas or perhaps a factor of ten bigger than that. Now consider how you
would harvest the ripe corn in that field.
Plan A: Get a single harvesting machine and start methodically chewing
through the corn, strip by strip.
Plan B: Get many, many harvesting machines and get them methodically
chewing through strips of corn in parallel with each other.
Plan A, I think you will agree, demonstrates about as much higher order
intelligence as a single ear of corn in that field. Plan B is obviously
the way to go. Why? Because from a harvesting point of view, one strip
of corn in one part of Texas, is utterly unrelated to any another. Thus
the harvesting can happen in parallel. Blindingly obvious right?
Parallelizing the work has other benefits apart from throughput. We can
use cheap and cheerful machines to do the work rather than a very
expensive uber-machine. Who cares if the individual cheap machines break
down? We just replace the ones that fail and forge on. The chances of
corn production stopping completely because of hardware failure are
basically infinitesimally small. What are the chances of hundreds,
perhaps thousands of independent machines all failing at the same time?
Effectively zero.
In IT terminology, the harvesting system exhibits high availability,
fault tolerance and a linear scaling relationship between throughput and
processing power. Nice.
Now lets switch from corn to, say, customers in our reveries. Imagine a
very large set of customers. We need to look at what they have bought
from us in the last month and generate invoices, one for each customer.
Plan A: Get a single invoice processing "machine". Start at one corner
of the customer list and work methodically through to the end.
Plan B: Get many, many invoice processing machines and get them
methodically chewing through strips of customers in parallel.
From a common sense perspective, Plan B is just as compelling as it is
in the corn harvesting example. However, by and large, enterprise
computing does not work that way. We go for Plan A most of the time.
Bigger, faster, more expensive individual machines to do processing of
tasks that could be done faster and cheaper with multiple machines
processing in parallel.
In fact, Plan B is even more compelling in computing than it is in
harvesting corn. The unit cost of processing machines continues to fall
through the floor. So much so that a large Web search engine provider -
who makes extensive use of parallelism - does not even replace
individual processing machines when they break down. Why? Because the
cost of dispatching an engineer to replace it is higher than the cost of
just tacking a new one on the end of the processing rack. This is a
pretty radical shift in the economics of computing.
What an embarrassment of computing power riches surrounds us! Yet, by
and large, in enterprise computing, we do the equivalent of heading off
into a corner of the corn field with a single machine, working through
our data strip by strip. If we were engineers in the agricultural
sector we would be laughed at. Seriously.
Two questions arise I guess. Firstly, why do we gravitate towards plan A
in enterprise computing and secondly, what should be done about it?
I think the first question is best answered by saying that we do it, the
way we do it, because we have *always* done it that way. Ever since John
von Neumann's brilliant insights into general purpose processing
machines[1], we have been building mental models of processing around
the idea of a single, all powerful CPU (the 'C' stands for 'Central'
after all). The so-called Von Neumann architecture is endemic in the way
we think about systems.
It is a curse of sorts. The easiest way to see it in action is to ask a
developer how long it will take to process a million invoices if each
invoice takes 10 seconds of processing time. After some calculation the
answer will be that it will take about one full day. Without even
thinking about it, most developers will serialize the invoice processing
and assume that everything will be routed through a single processor,
which will chew through invoices at the rate of one every 10 seconds.
The answer to the second question doubles as a rallying cry. There is a
ton of lore about parallel computing out there. There are real systems
such as Beowulf Clusters in the Linux world that show the power of
massive parallelism. There is the Grid Forum[2] which is emerging as a
focal point for activity in parallel computing technologies.
Now here is the rallying cry. A lot of the interest in massive
parallelism comes from scientists interested in fluid dynamics, N-Body
problems, quantum chemistry and the like. All really important stuff but
where are the commercial IT people?













