Abstract

What are the performance benefits of selectively relaxing the locality preferences of some tasks in parallel applications? Can load-balancing algorithms for a distributed-memory cluster benefit from this relaxation? This work investigates these ideas by employing application-level task locality for selection of tasks rather than hardware memory topology as is the norm in the literature. A prototype designed to evaluate these ideas is implemented in X10, a realization of the asynchronous partitioned global address space programming model. This evaluation reveals the applicability of this new approach to several real-world applications chosen from the Cowichan and the Lonestar suites. On a cluster of 128 processors, the new work-stealing strategy demonstrates a speedup between 12% and 32% over X10’s existing scheduler. Moreover, the new strategy does not degrade the performance of any of the applications studied.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.