Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs

Daichi Mukunoki,Takeshi Ogita

doi:10.1016/j.cam.2019.112701

Abstract

This paper presents the implementation, performance, and energy consumption of accurate and mixed-precision linear algebra kernels, including inner-product (DOT), dense matrix–vector multiplication (GEMV), dense matrix multiplication (GEMM), and sparse matrix–vector multiplication (SpMV) for the compressed sparse row (CSR) format (CSRMV), on graphics processing units (GPUs). We employ a mixed-precision design in our implementation, which makes it possible to perform internal floating-point operations with at least 2-fold the precision of the input and output data precision: for binary32 data, the computation is performed on binary64, and for binary64 data, the computation is performed on 2-fold the precision with an accurate inner product algorithm referred to as Dot2. We developed highly optimized implementations which can achieve performance close to the upper bound performance. From our evaluation on Titan V, a Volta architecture GPU, we made the following observations: as the Dot2 operation consumes 11 times binary64 instructions, GEMM requires the corresponding overheads (in terms of both execution time and energy consumption), compared to the standard binary64 implementation. On the other hand, the accuracy of DOT, GEMV, and CSRMV is improved with a very small overhead to the execution time and up to roughly 30% overhead to the energy requirement.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computational and Applied Mathematics	Publication Date: Jan 7, 2020
Citations: 12	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Applied Mathematics

Lead the way for us

Similar Papers

GPU Implementation of Image Convolution Using Sparse Model with Efficient Storage Format
Saira Banu Jamal Mohammed ... Sumithra Sriram
International Journal of Grid and High Performance Computing | VOL. 10
Saira Banu Jamal Mohammed, et. al.Saira Banu Jamal Mohammed ... Sumithra Sriram
01 Jan 2018
International Journal of Grid and High Performance Computing | VOL. 10

GPU Sparse Matrix Vector Multiplication Optimization Based on ELLB Storage Format
Haonan Chen ... Lianglun Cheng
-
Haonan Chen, et. al.Haonan Chen ... Lianglun Cheng
23 Feb 2023
23 Feb 2023

Reconfigurable sparse/dense matrix-vector multiplier
Georgi Kuzmanov ... Mottaqiallah Taouil
-
Georgi Kuzmanov, et. al.Georgi Kuzmanov ... Mottaqiallah Taouil
01 Dec 2009
01 Dec 2009

Energy consumption optimization of the Total-FETI solver and BLAS routines by changing the CPU frequency
David Horak ... Martin Beseda
-
David Horak, et. al.David Horak ... Martin Beseda
01 Jul 2016
01 Jul 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Applied Mathematics