Abstract

This study uses real system measurements to investigate the relationships between loop granularity, parallel loop distribution and barrier wait times, and their impact on the multiprogramming performance of loop parallel applications on the CEDAR shared-memory multiprocessor The overhead due to multiprogramming varies from 5% for applications with large loop granularity to 140% for applications with very fine-gram loops. This is because applications with fine-gram loops have unequal parallel work distribution among the clusters m multiprogrammed environments, while the parallel work in applications with large loop granularity is equally distributed. Moreover, increased barrier wait times of the mam task and wait-for-work times of the helper tasks also contribute to the multi-programming performance degradation of the fine-grain loop parallel applications. We propose and implement a self-preemption technique to address the problem of met eased barrier wait times and wait-for-work times. Using this technique, the overhead due to multiprogramming is reduced by as much as 100%, and speedups of 1.1 to 1.7 are obtained.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call