A BF16 FMA is All You Need for DNN Training

John Osorio,Eric Petit,Greg Henry,Adria Armejach,Marc Casas

doi:10.1109/tetc.2022.3187770

John Osorio, Eric Petit + Show 3 more

Open Access

https://doi.org/10.1109/tetc.2022.3187770

Copy DOI

Abstract

Fused Multiply-Add (FMA) functional units constitute a fundamental hardware component to train Deep Neural Networks (DNNs). Its silicon area grows quadratically with the mantissa bit count of the computer number format, which has motivated the adoption of the BrainFloat16 format (BF16). BF16 features 1 sign, 8 exponent and 7 explicit mantissa bits. Some approaches to train DNNs achieve significant performance benefits by using the BF16 format. However, these approaches must combine BF16 with the standard IEEE 754 Floating-Point 32-bit (FP32) format to achieve state-of-the-art training accuracy, which limits the impact of adopting BF16. This article proposes the first approach able to train complex DNNs entirely using the BF16 format. We propose a new class of FMA operators, <inline-formula><tex-math notation="LaTeX">$\mathrm{FMA}^{\mathrm {bf}16}_{\mathrm{n}\_\mathrm{m}}$</tex-math></inline-formula> , that entirely rely on BF16 FMA hardware instructions and deliver the same accuracy as FP32. <inline-formula><tex-math notation="LaTeX">$\mathrm{FMA}^{\mathrm {bf}16}_{\mathrm{n}\_\mathrm{m}}$</tex-math></inline-formula> operators achieve performance improvements within the 1.28-1.35× range on ResNet101 with respect to FP32. <inline-formula><tex-math notation="LaTeX">$\mathrm{FMA}^{\mathrm {bf}16}_{\mathrm{n}\_\mathrm{m}}$</tex-math></inline-formula> enables training complex DNNs on simple low-end hardware devices without requiring expensive FP32 FMA functional units.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Emerging Topics in Computing	Publication Date: Jul 1, 2022
Citations: 5	License type: other-oa

R Discovery Prime

R Discovery Prime

A BF16 FMA is All You Need for DNN Training

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Emerging Topics in Computing

Lead the way for us

Similar Papers

Design of a Coarse-Grained Processing Element for Matrix Multiplication on FPGA
Yuichi Okuyama ... Shigeyuki Takano
-
Yuichi Okuyama, et. al.Yuichi Okuyama ... Shigeyuki Takano
01 Sep 2014
01 Sep 2014

Speeding Up of CGRAs by Reshaping and Stochastic FMA
Tomoya Akabe ... Renyuan Zhang
-
Tomoya Akabe, et. al.Tomoya Akabe ... Renyuan Zhang
01 Nov 2021
01 Nov 2021

A Neural Network Training Processor With 8-Bit Shared Exponent Bias Floating Point and Multiple-Way Fused Multiply-Add Trees
Jeongwoo Park ... Sunwoo Lee
IEEE Journal of Solid-State Circuits | VOL. 57
Jeongwoo Park, et. al.Jeongwoo Park ... Sunwoo Lee
01 Mar 2022
IEEE Journal of Solid-State Circuits | VOL. 57

A convergence analysis of Nesterov’s accelerated gradient method in training deep linear neural networks
Xin Liu ... Zhisong Pan
Information Sciences | VOL. 612
Xin Liu, et. al.Xin Liu ... Zhisong Pan
05 Sep 2022
Information Sciences | VOL. 612

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A BF16 FMA is All You Need for DNN Training

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Emerging Topics in Computing