Parallelization libraries

Abhishek Bhattacharjee,Margaret Martonosi,Gilberto Contreras

doi:10.1145/1952998.1953003

Parallelization libraries

Abhishek Bhattacharjee, Margaret Martonosi + Show 1 more

Open Access

https://doi.org/10.1145/1952998.1953003

Copy DOI

Journal: ACM Transactions on Architecture and Code Optimization	Publication Date: Feb 5, 2011
Citations: 20

Affiliation: Rutgers, The State University of New Jersey, Princeton University, Nvidia (United States)

#Intel's Threading Building Blocks #Threading Building Blocks + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Creating efficient, scalable dynamic parallel runtime systems for chip multiprocessors (CMPs) requires understanding the overheads that manifest at high core counts and small task sizes. In this article, we assess these overheads on Intel's Threading Building Blocks (TBB) and OpenMP. First, we use real hardware and simulations to detail various scheduler and synchronization overheads. We find that these can amount to 47% of TBB benchmark runtime and 80% of OpenMP benchmark runtime. Second, we propose load balancing techniques such as occupancy-based and criticality-guided task stealing, to boost performance. Overall, our study provides valuable insights for creating robust, scalable runtime libraries.

Full Text