Sgap: towards efficient sparse tensor algebra compilation for GPU

Genghan Zhang,Guohao Dai,Zhongming Yu,Sitao Huang,Yanting Tao,Pavlos Petoumenos,Yu Wang,Yuan Wen,Yuetong Zhao

doi:10.1007/s42514-023-00140-4

Abstract

Sparse compiler is a promising solution for sparse tensor algebra optimization. In compiler implementation, reduction in sparse-dense hybrid algebra plays a key role in performance. Though GPU provides various reduction semantics that can better utilize the parallel computing and memory bandwidth capacity, the central question is: how to elevate the flexible reduction semantics to sparse compilation theory that assumes serial execution. Specifically, we have to tackle two main challenges: (1) there are wasted parallelism by adopting static synchronization granularity (2) static reduction strategy limits optimization space exploration. We propose Sgap: s egment g roup and a tomic p arallelism to solve these problems. Atomic parallelism captures the flexible reduction semantics to systematically analyze the optimization space of sparse-dense hybrid algebra on GPU. It is a new optimization technique beyond current compiler-based and open-source runtime libraries. Segment group elevates the flexible reduction semantics to suitable levels of abstraction in the sparse compilation theory. It adopts changeable group size and user-defined reduction strategy to solve challenge (1) and (2), respectively. Finally, we use GPU sparse matrix-matrix multiplication (SpMM) on the TACO compiler as a use case to demonstrate the effectiveness of segment group in reduction semantics elevation. We achieve up to $$1.2\,\times$$ speedup over the original TACO’s SpMM kernels. We also apply new optimization techniques found by atomic parallelism to an open-source state-of-the-art SpMM library dgSPARSE. We achieve $$1.6 \, \times \sim 2.3 \, \times$$ speedup on the algorithm tuned with atomic parallelism.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sgap: towards efficient sparse tensor algebra compilation for GPU

Abstract

Talk to us

Similar Papers

More From: CCF Transactions on High Performance Computing

Lead the way for us

Journal: CCF Transactions on High Performance Computing	Publication Date: May 8, 2023
Citations: 1

Similar Papers

DBCSR: A Blocked Sparse Tensor Algebra Library
Ilia Sivkov ... Alfio Lazzaro
-
Ilia Sivkov, et. al.Ilia Sivkov ... Alfio Lazzaro
20 Mar 2020
20 Mar 2020

English
Olivier Danvy ... Jacob Johannsen
-
Olivier Danvy, et. al.Olivier Danvy ... Jacob Johannsen
27 Jan 2015
27 Jan 2015

A syntactic and functional correspondence between reduction semantics and reduction-free full normalisers
Álvaro García-Pérez ... Pablo Nogueira
-
Álvaro García-Pérez, et. al.Álvaro García-Pérez ... Pablo Nogueira
21 Jan 2013
21 Jan 2013

IA-SpGEMM
Zhen Xie ... Weifeng Liu
-
Zhen Xie, et. al.Zhen Xie ... Weifeng Liu
26 Jun 2019
26 Jun 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sgap: towards efficient sparse tensor algebra compilation for GPU

Abstract

Talk to us

Similar Papers

More From: CCF Transactions on High Performance Computing