SSE Intrinsics and HPC

Scientific programming still has plenty of room to improve

One of my ambitions for the coming year of "Smart Development" is to pass on HPC (high-performance computing) tips based on our work in the area. I'm just now "gearing up" again after an absence of several years, and only have a few general comments from our own work to share to this point: there are a lot of idle CPU (and GPU!) cores out there, and even the ones actually computing something often are wasting their time.

I do respect Dr. Dobbs' efforts to write up ideas in HPC, though; I want to recommend, in particular, Stephen Blair-chappell's note on "... Performance Gains Using SSE Intrinsics". A speed-up by a factor of twenty at the cost of a little assembly-language recoding: that's the kind of result that deserves more attention. While most of our own work is at higher algorithmic levels, it's a treat to return to assembly when we can. One principle already serves us well on all the levels: when in doubt, look for cache misses--caching consumes a shocking amount of all the time in current computations.

What’s wrong? The new clean desk test
View Comments
You Might Like
Join the discussion
Be the first to comment on this article. Our Commenting Policies