Compilation scheme for near fine grain parallel processing on a multiprocessor system without explicit synchronization

W Ogata,H Kasahara,K Fujimoto,M Oota

doi:10.1109/pacrim.1995.519536

Abstract

In Fortran parallelizing compilers for multiprocessor systems, a loop parallelizing scheme has been used. However, there still exist loops to which the Do-all and Do-across techniques cannot be effectively applied because of loop carried dependence and conditional branches to the outside of the loops. Also, the compiler do not exploit the parallelism of the subroutines, loops and basic blocks and the near-fine-grain parallelism inside the basic blocks in the outside of loops or in sequential loops. Therefore, it is important to use coarse-grain parallelism and near-fine-grain parallelism in addition to loop parallelization. Taking into consideration the above facts, the authors propose a multigrain parallel processing scheme which combines coarse-grain parallel processing or macro-data flow processing, loop concurrency, and a near-fine-grain parallel processing hierarchy. To minimize the data transfer overhead and the total processing time, the proposed compilation scheme uses a static scheduling algorithm called CP/DT/MISF (critical path/data transfer/most immediate successors first). Also, to minimize the synchronization overhead, the compilation scheme eliminates all synchronization codes by using machine-clock level precise code scheduling for a target multiprocessor system OSCAR. This scheme has been implemented on OSCAR and a performance evaluation on OSCAR shows the proposed near-fine-grain parallel processing without synchronization reduces the processing time of test programs by 30% to 40% compared with conventional near-fine-grain parallel processing with synchronization codes.

Full Text