Abstract

The authors consider a new dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. It is shown that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. The authors propose a loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and colocate loop iterations with the necessary data. They compare the performance of this algorithm to that of other known algorithm using four representative applications on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, and a Sequent Symmetry, and they show that the algorithm offers substantial performance improvements, up to a factor of 3 in some cases. They conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds. >

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.