Abstract

We present a new space-time loop tiling approach and demonstrate its application for the generation of parallel tiled code of enhanced locality for three dynamic programming algorithms. The technique envisages that, for each loop nest statement, sub-spaces are first generated so that the intersection of them results in space tiles. Space tiles can be enumerated in lexicographical order or in parallel by using the wave-front technique. Then, within each space tile, time slices are formed, which are enumerated in lexicographical order. Target tiles are represented with multiple time slices within each space tile. We explain the basic idea of space-time loop tiling and then illustrate it by means of an example. Then, we present a formal algorithm and prove its correctness. The algorithm is implemented in the publicly available TRACO compiler. Experimental results demonstrate that parallel codes generated by means of the presented approach outperform closely related manually generated ones or those generated by using affine transformations. The main advantage of code generated by means of the presented approach is its enhanced locality due to splitting each larger space tile into multiple smaller tiles represented with time slices.

Highlights

  • In this paper, we deal with the generation of parallel tiled code for dynamic programming codes.Increasing dynamic programming code performance is not a trivial problem because, in general, that code exposes affine non-uniform data dependence patterns typical for nonserial polyadic dynamic programming (NPDP) [1], preventing tiling the innermost loop in a loop nest that restricts code locality improvement

  • For a given loop nest, optimizing compilers based on affine transformations such as PLuTo to generate tiles for which its dimension is equal to the number of linear independent solutions to the time partition constraints formed for that nest [16]

  • The paper presents a novel approach for space-time loop tiling implemented in the publicly available TRACO compiler

Read more

Summary

Introduction

We deal with the generation of parallel tiled code for dynamic programming codes. Well-known polyhedral techniques and corresponding compilers can be applied to automatically tile and can parallelize such a code, for example, the PLuTo and PPCG academic compilers as well as the commercial R-STREAM and IBM-XL compilers They extract and apply affine transformations to tile and parallelize loop nests. PLuTo fails to tile all loops in a given loop nest in the case of DP programs exposing non-uniform dependences [10] This reduces the locality of target tiled code. The approach presented in this paper can be applied to any affine loop nest, but its effectiveness is empirically confirmed by us only for a class of dynamic programming codes that expose affine non-uniform dependences for which its features prevent tiling one or more the innermost loops in a loop nest. Adapting the approach for resolving other problems is a topic for future research

Background
Overview of Space-Time Tiling
Imperfectly Nested Loops
Parallel Code Generation
Formal Algorithm
Applying Space-Time Tiling to the Examined Loop Nests
Related Work
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call