Compiler transformation to generate hybrid sparse computations

Huihui Zhang ,Anand Venkat ,Mary Hall

doi:10.5555/3018843.3018849

Abstract

Applications over sparse matrices and graphs often rely on efficient matrix representations that exploit the nonzero structure of the sparse representation. In some cases, this structure varies within the matrix, e.g., some portions are more dense and others are very sparse. For such matrices, hybrid algorithms are commonly used in sparse linear algebra and graph libraries, which employ multiple representations and computations. Automating such an approach in a compiler is difficult as it depends on analysis of the input matrix, which is only available at runtime. This paper describes compiler and runtime support for generating hybrid implementations. It automatically partitions the input matrix or graph into multiple disjoint subsets, which correspond to significant differences of nonzero structures. These subsets can then be optimized separately. For this purpose, the paper introduces a non-affine split transformation, which automatically generates an inspector and multiple executors. The inspector analyzes and partitions the input matrix according to the split criteria. The resulting executors are further optimized with customized transformations to derive specialized representations. We demonstrate the performance gains on an Nvidia K20c (Kepler) GPU of hybrid implementations for examples from sparse linear algebra and graph analytics: sparse matrix-vector multiplication and stochastic gradient descent.

Full Text