Why Threading Building Blocks are the best multicore programming solution

A new evaluation of popular parallel programming languages finds that the C++ library provides the best combination of usability and performance

AMD Opteron 6282SE multicore processor

An AMD Opteron 6282SE multicore processor

Credit: flickr/htomari

Multicore programming is a tricky problem. Developers want to be able to take advantage of multiple cores in order to improve application performance and scalability, but it can often require extra programming overhead and know-how to take advantage the extra power. While there are efforts underway to handle the issue at the CPU level, for now, software developers who want to distribute the processing load must handle it programmatically. But is there one language that’s better than others for parallel programming?

Answering that question was the goal of a new study by researchers in Switzerland. In their paper, Benchmarking Usability and Performance of Multicore Languages, Sebastian Nanz, Scott West, Kaue Soares da Silveira, and Bertrand Meyer compared several different popular approaches to multicore programming. Namely:

Chapel  - An object-oriented language developed by Cray as part of the DARPA High Productivity Computing Systems program. Chapel uses threads to implement independent computations.

Cilk - A general purpose language initially developed at MIT based on C. Cilk leaves it up to the programmer to explicitly specify what parts of the program can be executed in parallel and the runtime system handles the load balancing for these components.

Go - A language developed by Google for systems programming. Multicore programming is done through channels, where concurrent functions can share state and synchronize execution.

Threading Building Blocks (TBB) - A C++ library developed by Intel for making use of multicore processors. Developers express operations that are to take advantage of multiple cores and their dependencies through high level algorithms. The runtime environment then dynamically distributes the work across the various cores.

In order to evaluate these approaches, the researches had experienced programmers write both sequential and parallel solutions to common programming tasks (e.g., random number generation, histogram thresholding, multiplying matrices and vectors) then had their work further refined by experts in each language. The languages were evaluated for both usability (metrics: lines of code and coding time) and performance (metrics: execution time and speedup in runtime over sequential solutions) across an increasing number of processors. Here are the key findings:

Chapel isn’t ready for prime time - As the authors noted, the development of Chapel, to date, has been focussed more on usability than performance and it showed in their results. Chapel generated the smallest amount of source code but had the longest execution times. As for scalability, Chapel's parallel code speedup over sequential  execution consistently plateaued after 4-8 cores, at about 2-3 times improvement.

Go is all over the map - Go consistently required the most code for parallel processing, but, in terms of performance it varied greatly from task-to-task. For example, it was really fast and scaled well for random number generation but was quite slow and provided little speedup when chaining problems together.

Threading Building Blocks offered the best combination of usability and performance - TBB consistently had the quickest coding times and, along with Cilk, had the shortest execution times and the best scalability. At 32 cores, parallel coding in both TBB and Cilk saw a speedup of about 20 times over sequential programs for many tasks. As the authors wrote,

“... the library [TBB] provides is the most comprehensive of the four languages, containing algorithmic skeletons, task groups, synchronization and message passing facilities. The high level parallel algorithms were sufficient to implement every task in the benchmark set without dropping down to lower level primitives such as manual task creation and synchronization. Being a library for a well known language, it also has the fastest coding times.”

For now, at least, if you’re looking to get into multicore programming, TBB appears to be the best of these solutions. Or, perhaps, developers should just wait for the hardware folks to solve the problem for them. It’s your choice!

Related:
ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon