Abstract

In this paper we address the issue of iteration space tiling to minimize the completion time of loops when executed on multicomputers. The previous work on tiling assumes atomic execution of tiles to minimize synchronization costs. In this work, we remove the restriction of atomicity of tiles so that internal parallelism within tiles is exploited by overlapping computation with communication on multicomputers. The effectiveness of tiling is then critically dependent on the execution order of tasks within a tile. In this paper we present a theoretical framework based on equivalence classes that provides an optimal task ordering under assumptions of fixed and variable orderings of tasks in individual tiles. Our framework is able to handle loop invariant compile-time unknown dependences by efficiently generating optimal task orderings at run-time and results in lower loop completion times. Our solution is an improvement over previous approaches [Proceedings of Euromicro Workshop on Parallel and Distributed Processing, IEEE Computer Society Press, 1995, pp. 571–580; Proceedings of the International Conference on Application Specific Array Processors (ASAP), 1993, pp. 53–64]. Unlike [Proceedings of Euromicro Workshop on Parallel and Distributed Processing, IEEE Computer Society Press, 1995, pp. 571–580; Proceedings of the International Conference on Application Specific Array Processors (ASAP), 1993, pp. 53–64], our approach is optimal for all problem instances with one dependence vector in one-dimension. We show that the performance improvement over previous results is good.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call