Abstract

This work addresses the issue of exploiting intra-tile parallelism by overlapping communication with computation removing the restriction of atomicity of tiles. The effectiveness of tiling is then critically dependent on the execution order of tasks within a tile. We present a theoretical framework based on equivalence classes that provides an optimal task ordering under assumptions of constant and different permutations of tasks in individual tiles. Our framework is able to handle constant but compile-time unknown dependences by generating optimal task permutations at run-time and results in significantly lower loop completion times. Our solution is an improvement over previous approaches (Chou and Kung, 1993) (Dion et al., 1995) and is optimal for all problem instances. We also propose efficient algorithms that provide the optimal solution. The framework has been implemented as an optimization pass in the SUIF compiler and has been tested on a distributed memory system using a message passing model. We show that the performance improvement over previous results is substantial.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call