What is a CPU? It is a thing in the logical center of a computer that does the vast bulk of the number crunching involved in running programs. The 'C' stands for 'central'.
Ok. How do you make the best possible use of your CPU's processing power? How do you eek every last scintilla of performance out of it? It is a simple process really. You drop down to the fundamental instruction set of your CPU - its so called 'machine code'. You arrange your programs directly in this code. There is no need to translate this stuff into something your CPU will understand. It understands this stuff directly.
Machine code programming truly is programming in the raw. A high octane, full-on, base jumping sort of experience. By and large, developers avoid writing programs directly at the machine code level. Instead, they come up with better and better ways to generate machine code from their high level languages such as C++, Python, Fortran etc.
Programs at the machine code level either work or they do not work - just like every other program - but down here at machine code level, failures tend to be more dramatic. Programs that do not work have a habit of 'hanging' the machine. A condition from which a hard reboot is often only form of resuscitation.
Getting the best possible use out of your CPU's processing power used to be as simple as generating the best possible machine code. Those days are at an end. To see why, we need to step back a little bit. For many, many years we have had systems involving multiple CPUs and programming models that involve coordinating the activity of different CPUs working in parallel. Such forms of programming, with names like distributed computing, concurrent programming, parallel processing etc. are well known, well studied and have one thing in common.
They are all hard.
When I say these things (hereinafter lumped together under the rubric 'concurrency') are 'hard' I mean that they make the heads of even very experienced developers hurt. Given a choice, developers will avoid concurrency because of all the added complexity that comes with that territory. Concurrent programs are hard to write, hard to read and hard to debug. A bad combination.
Although developers dislike the complexity of concurrency, they are very, very fond of getting the most out of their CPUs. Historically, writing your program down at the machine code level was all you needed to do to ensure that you had maximum access to your processing power.
However, recent developments in processor architectures such as hyper-threading [1] are set to break this simple relationship between access to processing power and machine code programming. Processor makers are starting to put multiple classical 'CPUs' on a single chunk of silicon. Consequently, in order to best use the available power, it is no longer sufficient to program at a machine code level. The key to true performance maximization is now concurrency - getting those logically separate CPUs doing useful things at the same time.
Unfortunately, as we have seen, concurrency is hard. Very hard. It will be interesting to see how the tools of the trade change at a software level as a result of this change at the hardware level. Concurrent programming with languages/tools that do not help you and protect you from underlying complexity of concurrency is a recipe for hypertension. Yet more layers of tool support on top of 'classical' single-CPU languages is one possibility. A quantum leap into the limelight for languages like Erlang [2] is another.
Whatever happens, it is going to become increasingly difficult for developers to avoid the complex and subtle issues that are associated with concurrency. My head hurts already.
[1]
http://arstechnica.com/articles/paedia/cpu/hyperthreading.ars
[2]
http://c2.com/cgi/wiki?SmugErlangWeenie