Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator

Ardavan Pedram,Robert A Van De Geijn,Andreas Gerstlauer

doi:10.1109/tc.2014.2315627

Abstract

This paper examines the mapping of algorithms encountered when solving dense linear systems and linear least-squares problems to a custom Linear Algebra Processor. Specifically, the focus is on Cholesky, LU (with partial pivoting), and QR factorizations and their blocked algorithms. As part of the study, we expose the benefits of redesigning floating point units and their surrounding data-paths to support these complicated operations. We show how adding moderate complexity to the architecture greatly alleviates complexities in the algorithm. We study design tradeoffs and the effectiveness of architectural modifications to demonstrate that we can improve power and performance efficiency to a level that can otherwise only be expected of full-custom ASIC designs. A feasibility study of inner kernels is extended to blocked level and shows that, at block level, the Linear Algebra Core (LAC) can achieve high efficiencies with up to 45 GFLOPS/W for both Cholesky and LU factorization, and over 35 GFLOPS/W for QR factorization. While maintaining such efficiencies, our extensions to the MAC units can achieve up to 10, 12, and 20 percent speedup for the blocked algorithms of Cholesky, LU, and QR factorization, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers

Lead the way for us

Journal: IEEE Transactions on Computers	Publication Date: Aug 1, 2014
Citations: 39

Similar Papers

Floating Point Architecture Extensions for Optimized Matrix Factorization
A Pedram ... R A Van De Geijn
-
A Pedram, et. al.A Pedram ... R A Van De Geijn
01 Apr 2013
01 Apr 2013

FT-ScaLAPACK
Panruo Wu ... Zizhong Chen
-
Panruo Wu, et. al.Panruo Wu ... Zizhong Chen
23 Jun 2014
23 Jun 2014

Algorithm/Architecture Codesign of Low Power and High Performance Linear Algebra Compute Fabrics
Ardavan Pedram
-
Ardavan PedramArdavan Pedram
01 May 2013
01 May 2013

Analysis of Crout, Lu, Cholesky Decomposition, and QR Factorization: A Case Study On The Relationship Between Abiotic (Carbon and Nitrogen) and Biotic (Macrobenthos Diversity) Factors
Widowati Widowati
Waste Technology | VOL. 2
Widowati WidowatiWidowati Widowati
15 Oct 2015
Waste Technology | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers