Reducing the burden of parallel loop schedulers for many‐core processors

Mahwish Arif,Hans Vandierendonck

doi:10.1002/cpe.6241

Abstract

SummaryAs core counts in processors increases, it becomes harder to schedule and distribute work in a timely and scalable manner. This article enhances the scalability of parallel loop schedulers by specializing schedulers for fine‐grain loops. We propose a low‐overhead work distribution mechanism for a static scheduler that uses no atomic operations. We integrate our static scheduler with the Intel OpenMP and Cilkplus parallel task schedulers to build hybrid schedulers. Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. Detailed, quantitative measurements demonstrate that our techniques achieve scalable performance on a 48‐core machine and the scheduling overhead is 43% lower than Intel OpenMP and 12.1× lower than Cilk. We demonstrate consistent performance improvements on a range of HPC and data analytics codes. Performance gains are more important as loops become finer‐grain and thread counts increase. We observe consistently 16%–30% speedup on 48 threads, with a peak of 2.8× speedup.

Highlights

While Moore’s Law remains active, every new processor generation has an increasing number of CPU cores
3) Workers initialize local copies of reduction variables and execute work sent by the master 4) The master thread waits for the workers to complete, and partial results are reduced for reduction variables
For the parallel loop model, the worker threads are associated to a specific master which makes some synchronization steps redundant

Summary

Introduction

While Moore’s Law remains active, every new processor generation has an increasing number of CPU cores. Scheduling and distributing work load on large scale shared-memory machines becomes increasingly important to make efficient use of the hardware. The runtime overhead caused by scheduling, work distribution and synchronization [1] can make some parallel codes too fine-grain to make parallel execution worthwhile. This overhead, growing with the degree of parallelism, can affect the scalability of schedulers. This work focuses on fine-grain, micro-second-scale parallel loops, comparable in duration to the overhead of stateof-the-art schedulers on current hardware. We reason on commonly used loop scheduling techniques and propose a “half-barrier” pattern to remove redundant synchronisation

Contribution

Experimental Evaluation

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Concurrency and computation : practice & experience	Publication Date: Apr 5, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Reducing the burden of parallel loop schedulers for many‐core processors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Concurrency and computation : practice & experience

Lead the way for us

Similar Papers

Reducing the burden of parallel loop schedulers for many-core processors
Mahwish Arif ... Hans Vandierendonck
ACM SIGPLAN Notices | VOL. 53
Mahwish Arif, et. al.Mahwish Arif ... Hans Vandierendonck
10 Feb 2018
ACM SIGPLAN Notices | VOL. 53

Reducing the burden of parallel loop schedulers for many-core processors
Mahwish Arif ... Hans Vandierendonck
-
Mahwish Arif, et. al.Mahwish Arif ... Hans Vandierendonck
10 Feb 2018
10 Feb 2018

Real-Time Scheduling of Parallel Tasks under a General DAG Model
...
-
, et. al. ...
04 Nov 2014
04 Nov 2014

Static scheduling of hard real-time code with instruction-level timing accuracy
T.M Chung ... H.G Dietz
-
T.M Chung, et. al.T.M Chung ... H.G Dietz
30 Oct 1996
30 Oct 1996

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reducing the burden of parallel loop schedulers for many‐core processors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Concurrency and computation : practice & experience