Transformations and efficient parallel execution of loops with dependencies

M.r Ito ,A Zaafrani

doi:10.14288/1.0065118

Abstract

Loops are the main source of parallelism in scientific programs. Hence, several techniques were developed to detect parallelism in these loops and transform them into parallel forms. In this dissertation, compile time transformations and efficient parallel execution of loops with various type of dependencies are investigated. First, Doacross loops with uniform dependencies are considered for execution on distributed memory parallel machines (multicomputers). Most known Doacross loop execution techniques can be applied efficiently only to shared memory parallel machines. In this thesis, code reordering technique, improvements to partitioning strategies, and finding a balance between communication and parallelism are presented to reduce the execution time of Doacross loops on multicomputers. As with most parallelizing transformation techniques, only single loopnests are considered in the first part of this dissertation. However, parallelizing each loopnest in a program separately, even if an optimal execution can be obtained for each loopnest, may not result in an efficient execution of all the code in the program because of communication overhead across the loops in a multicomputer environment. Hence, across loop data dependence representation and analysis are introduced in this work to improve the parallel execution of the whole code. Our contribution consists of finding and representing data dependencies whose sources and destinations are subspaces of the iteration space mainly common across the loops. This type of dependence information is used in this thesis to improve global iteration space partitioning, automatic generation of communication statements across loops, and index alignment. The final part of this dissertation presents new parallelizing techniques for loops with irregular and complex dependencies. Various data dependence analysis algorithms can be found in the literature even for loops with complex array indexing. However, the improvement in data dependence testing has not been followed by similar amelioration in restructuring transformations for loops with complex dependencies. Such loops are mostly executed in serial mode. Our parallelizing techniques for these loops consists of identifying regions of the iteration space where all iterations can be executed in parallel. The advantages of all the transformations presented in this dissertation are: (1) they significantly reduce the execution time of loops with various types of dependencies as shown in this work using the MasPar machine (2) they can be implemented at compile, time which makes the task of parallel programming easier.

Full Text