Large-scale distributed linear algebra with tensor processing units

Adam G M Lewis,Jackson Beall,Martin Ganahl,Markus Hauru,Shrestha Basu Mallick,Guifre Vidal

doi:10.1073/pnas.2122762119

Abstract

We have repurposed Google tensor processing units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs' fast intercore interconnects (ICIs), physically two-dimensional network topology, and high-bandwidth memory (HBM) permit distributed matrix multiplication algorithms to rapidly become computationally bound. In this regime, the matrix-multiply units (MXUs) dominate the runtime, yielding impressive scaling, performance, and raw size: Operating in float32 precision, a full 2,048-core pod of third-generation TPUs can multiply two matrices with linear size [Formula: see text] in about 2 min. Via curated algorithms emphasizing large, single-core matrix multiplications, other tasks in dense linear algebra can similarly scale. As examples, we present 1) QR decomposition; 2) resolution of linear systems; and 3) the computation of matrix functions by polynomial iteration, demonstrated by the matrix polar factorization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences	Publication Date: Aug 8, 2022
Citations: 9	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Large-scale distributed linear algebra with tensor processing units

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences

Lead the way for us

Similar Papers

Large-Scale Discrete Fourier Transform on TPUs
Tianjian Lu ... Blake Hechtman
IEEE Access | VOL. 9
Tianjian Lu, et. al.Tianjian Lu ... Blake Hechtman
01 Jan 2020
IEEE Access | VOL. 9

Nonuniform Fast Fourier Transform on Tpus
Tianjian Lu ... Chao Ma
-
Tianjian Lu, et. al.Tianjian Lu ... Chao Ma
13 Apr 2021
13 Apr 2021

Relational Queries with a Tensor Processing Unit
Pedro Holanda ... Hannes Mühleisen
-
Pedro Holanda, et. al.Pedro Holanda ... Hannes Mühleisen
01 Jul 2019
01 Jul 2019

Impact of Structural Faults on Neural Network Performance
Krishna Teja Chitty-Venkata ... Arun Somani
-
Krishna Teja Chitty-Venkata, et. al.Krishna Teja Chitty-Venkata ... Arun Somani
01 Jul 2019
01 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large-scale distributed linear algebra with tensor processing units

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences