Abstract
The loop partitioning problem on modern distributed memory systems is no longer fully communication bound primarily due to a significantly lower ratio of communication/computation speeds. The useful parallelism may be exploited on these systems to an extent that the communication balances the parallelism and does not produce a very high overhead to nullify all the gains due to the parallelism. We describe a compile time partitioning and scheduling approach based on the above motivation for DOALL loops where communication without data replication is inevitable. First, the code partitioning phase analyzes the references in the body of the DOALL loop nest and determines a set of directions for reducing a larger degree of communication by trading a lesser degree of parallelism. Next, the data distribution phase uses a new larger partition owns rule to achieve computation and communication load balance. The granularity adjustment phase attempts to further eliminate communication through merging partitions to reduce the completion time. Finally, the load balancing phase attempts to reduce the number of processors without degrading the completion time and the mapping phase schedules the partitions on available processors. Relevant theory and algorithms are developed along with a performance evaluation on Cray T3D.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.