Abstract

AbstractEfficiently implementing the divide-and-conquer pattern of parallelism in distributed memory systems is very relevant, given its ubiquity, and difficult, given its recursive nature and the need to exchange tasks and data among the processors. This task is noticeably further complicated in the presence of multi-core systems, where hybrid parallelism must be exploited to attain the best performance, and when unbalanced and deep workloads are considered, as additional measures must be taken to load balance and avoid deep recursion problems. In this manuscript a parallel skeleton that fulfills all these requirements while providing high levels of usability is presented. In fact, the evaluation shows that our proposal is on average 415.32% faster than MPI codes and 229.18% faster than MPI + OpenMP benchmarks, while offering an average improvement in the programmability metrics of 131.04% over MPI alternatives and 155.18% over MPI + OpenMP solutions.

Highlights

  • The development of parallel applications requires a great effort, when the best possible performance is sought and distributed memory systems, or worse, systems with portions of both shared and distributed memory, are Universidade da Coruña, CITIC, Computer Architecture Group, 15071 A Coruña, Spain 2 Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Dpto

  • One of the best approaches to deal with this problem are parallel skeletons [17], which hide the complexity of parallelism while providing good performance as well as high-level semantics and easy-to-use APIs

  • The T3XXL tree is predefined in the uts distribution package and it processes a binomial tree, which is a very unbalanced and unpredictable problem. This is an optimal adversary for load balancing strategies

Read more

Summary

Introduction

The development of parallel applications requires a great effort, when the best possible performance is sought and distributed memory systems, or worse, systems with portions of both shared and distributed memory, are Universidade da Coruña, CITIC, Computer Architecture Group, 15071 A Coruña, Spain 2 Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Dpto. One of the best approaches to deal with this problem are parallel skeletons [17], which hide the complexity of parallelism while providing good performance as well as high-level semantics and easy-to-use APIs. Skeletons target different parallel patterns, one of the most relevant ones arguably being divide-and-conquer, denoted D&C. The second one is that a properly designed D&C skeleton can be used to express several of the other most basic and critic parallel patterns such as map or reduce, much further expanding its scope of applicability

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call