Abstract

Since chip multiprocessors have dominated the processor market, developing a parallel programming model with proper trade-off between productivity and efficiency become increasingly important. As a typical fine-grain parallelism model, Intel Threading Building Blocks (TBB) simplifies parallel programming by runtime schedule. Despite its simplicity, it costs non-trivial runtime overhead which may increase as the thread counts increase. In this work, we conduct an experiment on real commodity hardware to evaluate performance scalability of TBB using PARSEC benchmark suite. We first compare TBB with Pthreads to show that TBB applications can achieve comparable performance as Pthreads applications. To find the performance bottleneck of TBB applications, we measure the runtime overhead of TBB focused on 3 basic TBB runtime activities. The result provides valuable implications which can be used to develop scalable runtime libraries and architectural support for alleviating performance bottlenecks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call