Abstract

This paper describes C ronus, a platform for parallelizing general nested loops. General nested loops contain complex loop bodies (assignments, conditionals, repetitions) and exhibit uniform loop-carried dependencies. The novelty of C ronus is twofold: (1) it determines the optimal scheduling hyperplane using the QuickHull algorithm, which is more efficient than previously used methods, and (2) it implements a simple and efficient dynamic rule (successive dynamic scheduling) for the runtime scheduling of the loop iterations along the optimal hyperplane. This scheduling policy enhances data locality and improves the makespan. C ronus provides an efficient runtime library, specifically designed for communication minimization, that performs better than more generic systems, such as Berkeley UPC. Its performance was evaluated through extensive testing. Three representative case studies are examined: the Floyd–Steinberg dithering algorithm, the Transitive Closure algorithm, and the FSBM motion estimation algorithm. The experimental results corroborate the efficiency of the parallel code. The tests show speedup ranging from 1.18 (out of the ideal 4) to 12.29 (out of the ideal 16) on distributed-systems and 3.60 (out of 4) to 15.79 (out of 16) on shared-memory systems. C ronus outperforms UPC by 5–95% depending on the test case.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call