Hybrid parallel task placement in irregular applications

Jeeva Paudel,José Nelson Amaral

doi:10.1016/j.jpdc.2014.09.014

Abstract

What are the performance benefits of selectively relaxing the locality preferences of some tasks in parallel applications? Can load-balancing algorithms for a distributed-memory cluster benefit from this relaxation? This work investigates these ideas by employing application-level task locality for selection of tasks rather than hardware memory topology as is the norm in the literature. A prototype designed to evaluate these ideas is implemented in X10, a realization of the asynchronous partitioned global address space programming model. This evaluation reveals the applicability of this new approach to several real-world applications chosen from the Cowichan and the Lonestar suites. On a cluster of 128 processors, the new work-stealing strategy demonstrates a speedup between 12% and 32% over X10’s existing scheduler. Moreover, the new strategy does not degrade the performance of any of the applications studied.

Full Text