Abstract

The OpenMP standard is the primary mechanism used at high performance computing facilities to allow intra-process parallelization. In contrast, many HEP specific software packages (such as CMSSW, GaudiHive, and ROOT) make use of Intel’s Threading Building Blocks (TBB) library to accomplish the same goal. In these proceedings we will discuss our work to compare TBB and OpenMP when used for scheduling algorithms to be run by a HEP style data processing framework. This includes both scheduling of different interdependent algorithms to be run concurrently as well as scheduling concurrent work within one algorithm. As part of the discussion we present an overview of the OpenMP threading model. We also explain how we used OpenMP when creating a simplified HEP-like processing framework. Using that simplified framework, and a similar one written using TBB, we will present performance comparisons between TBB and different compiler versions of OpenMP.

Highlights

  • The CMS experiment at the LHC has used a multi-thread enabled data processing framework, CMSSW [1], for large scale data processing since the start of LHC Run 2 in 2016

  • We have found when we communicate with High Performance Computing (HPC) specialists, they often ask why we are not using OpenMP for concurrency

  • The #pragma omp parallel statement starts threads which are used to process the C++ block directly following the statement. Those threads can only be used by that parallel construct. (This is relevant for the case of nested parallel blocks we will discuss in subsection 2.3.) The thread which first encountered the pragma statement, OpenMP refers to this thread as master, will join in processing the block

Read more

Summary

Introduction

The CMS experiment at the LHC has used a multi-thread enabled data processing framework, CMSSW [1], for large scale data processing since the start of LHC Run 2 in 2016. Using multiple threads allows the framework to use substantially less memory per CPU than running many single threaded jobs allowing jobs to fit within CMS’s memory constraints. This framework makes use of Intel’s Threading Building Blocks (TBB) library [2] to handle scheduling of processing tasks across the limited number of threads available to the process. The reason is the growing need for CMS to exploit resources from High Performance Computing (HPC) facilities in the coming years. These facilities typically support only OpenMP as the intra-process concurrency mechanism. This is followed by the experimental setup used to do the measurements as well as the results of the measurements

Review of OpenMP Commands
Construct: omp parallel
Construct: omp for
Nested parallel blocks
Construct: omp task
Construct: omp taskloop
Demonstrator Frameworks
Experimental Setup and Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.