Abstract

We address the problem of optimizing sparse tensor algebra in a compiler and show how to define standard loop transformations---split, collapse, and reorder---on sparse iteration spaces. The key idea is to track the transformation functions that map the original iteration space to derived iteration spaces. These functions are needed by the code generator to emit code that maps coordinates between iteration spaces at runtime, since the coordinates in the sparse data structures remain in the original iteration space. We further demonstrate that derived iteration spaces can tile both the universe of coordinates and the subset of nonzero coordinates: the former is analogous to tiling dense iteration spaces, while the latter tiles sparse iteration spaces into statically load-balanced blocks of nonzeros. Tiling the space of nonzeros lets the generated code efficiently exploit heterogeneous compute resources such as threads, vector units, and GPUs. We implement these concepts by extending the sparse iteration theory implementation in the TACO system. The associated scheduling API can be used by performance engineers or it can be the target of an automatic scheduling system. We outline one heuristic autoscheduling system, but other systems are possible. Using the scheduling API, we show how to optimize mixed sparse-dense tensor algebra expressions on CPUs and GPUs. Our results show that the sparse transformations are sufficient to generate code with competitive performance to hand-optimized implementations from the literature, while generalizing to all of the tensor algebra.

Highlights

  • IntroductionUnlike their dense counterparts, lack an iteration space transformation framework

  • Sparse tensor algebra compilers, unlike their dense counterparts, lack an iteration space transformation framework

  • We propose a unified sparse iteration space transformation framework for the dense and sparse iteration spaces that come from sparse tensor algebra

Read more

Summary

Introduction

Unlike their dense counterparts, lack an iteration space transformation framework. Dense tensor algebra compilers, such as TCE [Auer et al 2006], Halide [RaganKelley et al 2012], TVM [Chen et al 2018a], and TC [Vasilache et al 2018], build on many decades of research on affine loop transformations [Allen and Cocke 1972; Wolfe 1982], leading to sophisticated models like the polyhedral model [Feautrier 1988; Lamport 1974]. Despite progress over the last three decades [Bik and Wijshoff 1993; Kotlyar et al 1997; Strout et al 2018], we lack a unifying sparse iteration space framework that can model all of sparse tensor algebra. Without a general transformation framework, sparse tensor algebra compilers cannot optimize the dense loops in mixed sparse and dense expressions

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call