On the Use of BLAS Libraries in Modern Scientific Codes at Scale

Harry Waugh,Simon Mcintosh-Smith

doi:10.1007/978-3-030-63393-6_5

Abstract

As we approach the Exascale era, computer architectures are evolving ever-greater vector and matrix acceleration units—NVIDIA’s Ampere Tensor Cores, Intel’s AMX, and Arm’s SVE vector instruction set developments are just three recent examples [1, 2, 10]. To exploit these, it is expected that optimised math libraries such as those for dense and sparse linear algebra, will play an increasing role in achieving optimal performance. It is therefore useful to understand which of these functions dominate an application’s runtime, and in particular how this changes with increasing scale. This work aims to provide a contemporary dataset regarding how much dense linear algebra (BLAS) is used in HPC codes at scale. We have analysed several science codes widely used on the UK HPC service, ARCHER (https://www.archer.ac.uk), including CASTEP, CP2K, QuantumESPRESSO, and Nektar++. To capture demands from the AI community, we have additionally traced the training stage of the Convolutional Neural Network (CNN), AlexNet [7]. HPLinpack is also included as a reference, as it exhibits a well-understood BLAS usage pattern. Results from across all the codes show that, unlike HPLinpack, BLAS usage is never more than 25% of the total runtime, even when running at a modest scale (32 nodes of the Arm-based supercomputer, Isambard). This presents limited speedup opportunity when considering Amdahl’s law, and suggests that application developers may need to adjust their algorithms to spend more time in optimised BLAS libraries to capitalise on new architectures and accelerators.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the Use of BLAS Libraries in Modern Scientific Codes at Scale

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A systematic approach to improving data locality across Fourier transforms and linear algebra operations
Doru Thom Popovici ... Zhengji Zhao
-
Doru Thom Popovici, et. al.Doru Thom Popovici ... Zhengji Zhao
03 Jun 2021
03 Jun 2021

Abstract: Matrices Over Runtime Systems at Exascale
Emmanuel Agullo ...
-
Emmanuel Agullo, et. al.Emmanuel Agullo ...
01 Nov 2012
01 Nov 2012

Poster: Matrices over Runtime Systems at Exascale
Emmanuel Agullo ...
-
Emmanuel Agullo, et. al.Emmanuel Agullo ...
01 Nov 2012
01 Nov 2012

An Experimental Study of Self-Optimizing Dense Linear Algebra Software
M Kulkarni ... K Pingali
Proceedings of the IEEE | VOL. 96
M Kulkarni, et. al.M Kulkarni ... K Pingali
01 May 2008
Proceedings of the IEEE | VOL. 96

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the Use of BLAS Libraries in Modern Scientific Codes at Scale

Abstract

Talk to us

Similar Papers