Abstract

AbstractWith the recent arrival of the exascale era, modern supercomputers are increasingly big making their programming much more complex. In addition to performance, software productivity is a major concern to choose a programming language, such as Chapel, designed for exascale computing. In this paper, we investigate the design of a parallel distributed tree‐search algorithm, namely P3D‐DFS, and its implementation using Chapel. The design is based on the Chapel's DistBag data structure, revisited by: (1) redefining the data structure for Depth‐First tree‐Search (DFS), henceforth renamed DistBag‐DFS; (2) redesigning the underlying load balancing mechanism. In addition, we propose two instantiations of P3D‐DFS considering the Branch‐and‐Bound (B&B) and Unbalanced Tree Search (UTS) algorithms. In order to evaluate how much performance is traded for productivity, we compare the Chapel‐based implementations of B&B and UTS to their best‐known counterparts based on traditional OpenMP (intra‐node) and MPI+X (inter‐node). For experimental validation using 4096 processing cores, we consider the permutation flow‐shop scheduling problem for B&B and synthetic literature benchmarks for UTS. The reported results show that P3D‐DFS competes with its OpenMP baselines for coarser‐grained shared‐memory scenarios, and with its MPI+X counterparts for distributed‐memory settings, considering both performance and productivity‐awareness. In the context of this work, this makes Chapel an alternative to OpenMP/MPI+X for exascale programming.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call