One of my ambitions for the coming year of "Smart Development" is to pass on HPC (high-performance computing) tips based on our work in the area. I'm just now "gearing up" again after an absence of several years, and only have a few general comments from our own work to share to this point: there are a lot of idle CPU (and GPU!) cores out there, and even the ones actually computing something often are wasting their time.
I do respect Dr. Dobbs' efforts to write up ideas in HPC, though; I want to recommend, in particular, Stephen Blair-chappell's note on "... Performance Gains Using SSE Intrinsics". A speed-up by a factor of twenty at the cost of a little assembly-language recoding: that's the kind of result that deserves more attention. While most of our own work is at higher algorithmic levels, it's a treat to return to assembly when we can. One principle already serves us well on all the levels: when in doubt, look for cache misses--caching consumes a shocking amount of all the time in current computations.