Abstract

Nested loops are main source of the parallelism in many scientific applications. Partitioning the iteration space of nested loops with data dependencies into tiles and assigning them to processing nodes for parallel execution is essential for achieving high performance. Although most of the previous work focused on tiling on fully connected homogeneous distributed systems, some studies have been devoted to tiling on partially connected distributed systems. In this paper, we address the parallelization of perfectly nested loops with dependencies on partially connected heterogeneous distributed systems and present a topology and computational-power aware tile mapping. This work aims to take into account not only the node’s computational power when tiling iteration space of nested loops but also the exploitation of the network topology when mapping tiles to processing nodes. This approach allows minimizing the parallel execution time by improving the load balancing and minimizing the communication costs. We demonstrate the performance of proposed method by comparing it with the computational-power aware tile mapping and the topology aware tile mapping. The experimental results show that the proposed method improves the parallel execution time by up to 62% and 28% compared with the computational-power aware tile mapping and the topology aware tile mapping, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call