Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design

Farhad Merchant,S K Nandy,Ranjani Narayan,Soumyendu Raha,Anupam Chattopadhyay

doi:10.1142/s0129626417500062

Abstract

Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building blocks for several High Performance Computing (HPC) applications and hence dictate performance of the HPC applications. Performance in such tuned packages is attained through tuning of several algorithmic and architectural parameters such as number of parallel operations in the Directed Acyclic Graph of the BLAS/LAPACK routines, sizes of the memories in the memory hierarchy of the underlying platform, bandwidth of the memory, and structure of the compute resources in the underlying platform. In this paper, we closely investigate the impact of the Floating Point Unit (FPU) micro-architecture for performance tuning of BLAS and LAPACK. We present theoretical analysis for pipeline depth of different floating point operations like multiplier, adder, square root, and divider followed by characterization of BLAS and LAPACK to determine several parameters required in the theoretical framework for deciding optimum pipeline depth of the floating operations. A simple design of a Processing Element (PE) is presented and shown that the PE outperforms the most recent custom realizations of BLAS and LAPACK by 1.1X to 1.5X in GFlops/W, and 1.9X to 2.1X in Gflops/mm2. Compared to multicore, General Purpose Graphics Processing Unit (GPGPU), Field Programmable Gate Array (FPGA), and ClearSpeed CSX700, performance improvement of 1.8-80x is reported in PE.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Parallel Processing Letters	Publication Date: Dec 1, 2017
Citations: 10	License type: mit

R Discovery Prime

R Discovery Prime

Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design

Abstract

Talk to us

Similar Papers

More From: Parallel Processing Letters

Lead the way for us

Similar Papers

HPC Process and Optimal Network Device Affinitization
Ravindra Babu Ganapathi ... Aravind Gopalakrishnan
IEEE Transactions on Multi-Scale Computing Systems | VOL. 4
Ravindra Babu Ganapathi, et. al.Ravindra Babu Ganapathi ... Aravind Gopalakrishnan
01 Oct 2018
IEEE Transactions on Multi-Scale Computing Systems | VOL. 4

Adapting to Hostile Architectural Environments

Scalable Computing Practice and Experience | VOL. 2

01 Jan 1998
Scalable Computing Practice and Experience | VOL. 2

Enabling high performance computing in cloud computing environments
M Kumaresan ... G.K.D Prasanna Venkatesan
-
M Kumaresan, et. al.M Kumaresan ... G.K.D Prasanna Venkatesan
01 Apr 2017
01 Apr 2017

Software Libraries for Linear Algebra Computations on High Performance Computers
Jack J Dongarra ... David W Walker
SIAM Review | VOL. 37
Jack J Dongarra, et. al.Jack J Dongarra ... David W Walker
01 Jun 1995
SIAM Review | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design

Abstract

Talk to us

Similar Papers

More From: Parallel Processing Letters