A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines

Ahmad Abdelfattah,Sven Hammarling,Mark Gates,Mawussi Zounon,Stanimire Tomov,Jack Dongarra,Jakub Kurzak,Piotr Luszczek,Timothy Costa,Nicholas J Higham,Azzam Haidar

doi:10.1145/3431921

Abstract

This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a Batched BLAS routine. The matrices are grouped together in uniformly sized groups, with just one group if all the matrices are of equal size. The aim is to provide more efficient, but portable, implementations of algorithms on high-performance many-core platforms. These include multicore and many-core CPU processors, GPUs and coprocessors, and other hardware accelerators with floating-point compute facility. As well as the standard types of single and double precision, we also include half and quadruple precision in the standard. In particular, half precision is used in many very large scale applications, such as those associated with machine learning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Mathematical Software

Lead the way for us

Journal: ACM Transactions on Mathematical Software	Publication Date: Jun 26, 2021
Citations: 20

Similar Papers

BLASFEO
Gianluca Frison ... Tommaso Sartor
ACM Transactions on Mathematical Software | VOL. 44
Gianluca Frison, et. al.Gianluca Frison ... Tommaso Sartor
31 Jul 2018
ACM Transactions on Mathematical Software | VOL. 44

Matrix multiplication on batches of small matrices in half and half-complex precisions
Ahmad Abdelfattah ... Jack Dongarra
Journal of Parallel and Distributed Computing | VOL. 145
Ahmad Abdelfattah, et. al.Ahmad Abdelfattah ... Jack Dongarra
15 Jul 2020
Journal of Parallel and Distributed Computing | VOL. 145

Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs
Daichi Mukunoki ... Daisuke Takahashi
-
Daichi Mukunoki, et. al.Daichi Mukunoki ... Daisuke Takahashi
01 Jan 2012
01 Jan 2012

VLSI and Hardware Implementation Using Machine Learning Methods: A Systematic Literature Review
Kusum Lata ... G R Sinha
-
Kusum Lata, et. al.Kusum Lata ... G R Sinha
18 Nov 2021
18 Nov 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Mathematical Software