A Code Generator for High-Performance Tensor Contractions on GPUs

Jinsung Kim,Atanas Rountev,P Sadayappan,Sriram Krishnamoorthy,Ajay Panyala,Vineeth Thumma,Aravind Sukumaran-Rajam,Louis-Noel Pouchet

doi:10.1109/cgo.2019.8661182

Abstract

Tensor contractions are higher dimensional generalizations of matrix-matrix multiplication. They form the compute-intensive core of many applications in computational science and data science. In this paper, we describe a high-performance GPU code generator for arbitrary tensor contractions. It exploits domain-specific properties about data reuse in tensor contractions to devise an effective code generation schema, coupled with an effective model-driven search, to determine parameters for mapping of computation to threads and staging of data through the GPU memory hierarchy. Experimental evaluation using a set of tensor contraction benchmarks demonstrates performance improvement and/or significantly reduced code generation time over other state-of-the-art tensor contraction libraries and code generators.

Full Text