A code generator for high-performance tensor contractions on GPUs

Jin-Sung Kim ,Vineeth Thumma ,Louis-Noël Pouchet ,Aravind Sukumaran-Rajam ,Atanas Rountev ,Ajay Panyala ,Sriram Krishnamoorthy ,P Sadayappan

doi:10.5555/3314872.3314885

Abstract

Tensor contractions are higher dimensional generalizations of matrix-matrix multiplication. They form the compute-intensive core of many applications in computational science and data science. In this paper, we describe a high-performance GPU code generator for arbitrary tensor contractions. It exploits domain-specific properties about data reuse in tensor contractions to devise an effective code generation schema, coupled with an effective model-driven search, to determine parameters for mapping of computation to threads and staging of data through the GPU memory hierarchy. Experimental evaluation using a set of tensor contraction benchmarks demonstrates performance improvement and/or significantly reduced code generation time over other state-of-the-art tensor contraction libraries and code generators.

Full Text