Targeting DNN Inference Via Efficient Utilization of Heterogeneous Precision DNN Accelerators

Ourania Spantidi,Iraklis Anagnostopoulos,Hussam Amrouch,Jörg Henkel,Sami Alsalamin,Isai Roman-Ballesteros,Georgios Zervakis

doi:10.1109/tetc.2022.3178730

Abstract

Modern applications rely more and more on the simultaneous execution of multiple DNNs, and Heterogeneous DNN Accelerators (HDAs) prevail as a solution to this trend. In this work, we propose, implement, and evaluate low precision Neural Processing Units (NPUs) which serve as building blocks to construct HDAs, to address the efficient deployment of multi-DNN workloads. Moreover, we design and evaluate HDA designs that increase the overall throughput, while reducing the energy consumption during NN inference. At the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">design time</i> , we implement HDAs inspired by the big.LITTLE computing paradigm, consisting of 8-bit NPUs together with lower precision bit-width NPUs. Additionally, an NN-to-NPU scheduling methodology is implemented to decide at <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">run-time</i> how to map the executed NN to the suitable NPU based on an accuracy drop threshold value. Our <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">hardware/software co-design</i> reduces the energy and response time of NNs by 29% and 10% respectively when compared to state-of-the-art homogeneous architectures. This comes with a negligible accuracy drop of merely 0.5%. Similar to the traditional CPU big.LITTLE, our asymmetric NPU design can open new doors for designing novel DNN accelerator architectures, due to their profound role in increasing the efficiency of DNNs with minimal losses in accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Targeting DNN Inference Via Efficient Utilization of Heterogeneous Precision DNN Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Emerging Topics in Computing

Lead the way for us

Journal: IEEE Transactions on Emerging Topics in Computing	Publication Date: Jan 1, 2023
Citations: 5

Similar Papers

Thermal-Aware Design for Approximate DNN Accelerators
Georgios Zervakis ... Isai Roman-Ballesteros
IEEE Transactions on Computers | VOL. 71
Georgios Zervakis, et. al.Georgios Zervakis ... Isai Roman-Ballesteros
01 Oct 2022
IEEE Transactions on Computers | VOL. 71

Investigation into designing VLSI of a flexible architecture for a deep neural network accelerator
Aayushi Arya
Journal of Electrical Systems | VOL. 20
Aayushi Arya Aayushi Arya
29 Apr 2024
Journal of Electrical Systems | VOL. 20

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
Yu-Hsin Chen ... Tien-Ju Yang
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | VOL. 9
Yu-Hsin Chen, et. al.Yu-Hsin Chen ... Tien-Ju Yang
01 Jun 2019
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | VOL. 9

A Novel DNN Accelerator for Light-weight Neural Networks: Concept and Design
Yu-Guang Chen ... Tsung-Han Hsieh
-
Yu-Guang Chen, et. al.Yu-Guang Chen ... Tsung-Han Hsieh
13 Jun 2022
13 Jun 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Targeting DNN Inference Via Efficient Utilization of Heterogeneous Precision DNN Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Emerging Topics in Computing