Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

Vincent Camus,Christian Enz,Linyan Mei,Marian Verhelst

doi:10.1109/jetcas.2019.2950386

Abstract

The current trend for deep learning has come with an enormous computational need for billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced precision has demonstrated large benefits with low impact on accuracy, paving the way towards processing in mobile devices and IoT nodes. To this end, various precision-scalable MAC architectures optimized for neural networks have recently been proposed. Yet, it has been hard to comprehend their differences and make a fair judgment of their relative benefits as they have been implemented with different technologies and performance targets. To overcome this, this work exhaustively reviews the state-of-the-art precision-scalable MAC architectures and unifies them in a new taxonomy. Subsequently, these different topologies are thoroughly benchmarked in a 28nm commercial CMOS process, across a wide range of performance targets, and with precision ranging from 2 to 8 bits. Circuits are analyzed for each precision as well as jointly in practical use cases, highlighting the impact of architectures and scalability in terms of energy, throughput, area and bandwidth, aiming to understand the key trends to reduce computation costs in neural-network processing.

Highlights

E MBEDDED deep learning has gained a lot of attention nowadays due to its broad application prospects and vast potential market
Since the various MAC architectures should have very different optimal operating frequencies, this study explores a broad range of clock targets with frequencies from 600 MHz to 4 GHz
This study models the impact of Dynamic Voltage-Frequency Scaling (DVFS) on the circuits, assessing throughput and energy for each mode while sweeping the voltage from 1 V down to 0.8 V

Summary

INTRODUCTION

E MBEDDED deep learning has gained a lot of attention nowadays due to its broad application prospects and vast potential market. The main challenge to embrace this era of edge intelligence comes from the supply-anddemand gap between the limited energy budget of embedded devices, often battery powered, and the computationallyintensive deep-learning algorithms, requiring billions of Multiply-Accumulate (MAC) operations and data movements To alleviate this unbalanced relationship, many approaches have been investigated at different levels of abstraction. New topologies have been proposed at circuit level to improve energy or performance beyond conventional design by exploiting data locality or error tolerance [6]–[8] Among these techniques, reduced-precision computing has demonstrated large benefits with low or negligible impact on the network accuracy [9], [10]. Source code and supplementary materials are available online at: https://github.com/vincent-camus/benchmarkingprecision-scalable-mac-units

DATAFLOW IMPLICATIONS OF PRECISION SCALABILITY

SA and ST at Algorithm Level

SA and ST at Precision-Scalable MAC-Unit Level

SA and ST at PE-Array Level

Taxonomy

SURVEY OF SCALABLE MAC ARCHITECTURES

DESIGN AND BENCHMARK METHODOLOGY

Design Considerations and Assumptions

Design Space

DETAILED ANALYSIS

Physical Implementation and Timing Analysis

Power Estimation

Bandwidth

Bandwidth per Operation

Throughput Evaluation

Area Breakdown

Energy Overhead at Full Precision

Energy Scaling

Comparison of Scalable MACs at Nominal Voltage

Comparison of Scalable MACs with DVFS

Introduction and Methodology

Equal Usage

Findings

VIII. CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE journal on emerging and selected topics in circuits and systems	Publication Date: Dec 1, 2019
Citations: 76	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE journal on emerging and selected topics in circuits and systems

Lead the way for us

Similar Papers

Survey of Precision-Scalable Multiply-Accumulate Units for Neural-Network Processing
Vincent Camus ... Marian Verhelst
-
Vincent Camus, et. al.Vincent Camus ... Marian Verhelst
01 Mar 2019
01 Mar 2019

GROLO: Realistic Range-Based Localization for Mobile IoTs Through Global Rigidity
Hejun Wu ... Zhimin Ding
IEEE Internet of Things Journal | VOL. 6
Hejun Wu, et. al.Hejun Wu ... Zhimin Ding
01 Jun 2019
IEEE Internet of Things Journal | VOL. 6

Smart City Mobile IoT Management with DLTs
Shaibal Chakrabarty ... Daniel W Engels
-
Shaibal Chakrabarty, et. al.Shaibal Chakrabarty ... Daniel W Engels
10 Jan 2021
10 Jan 2021

PMCAR: proactive mobility and congestion aware route prediction mechanism in IoMT for delay sensitive medical applications to ensure reliability in COVID-19 pandemic situation
Suganya Pandi ... Pradeep Reddy Ch.
International Journal of Pervasive Computing and Communications | VOL. 16
Suganya Pandi, et. al.Suganya Pandi ... Pradeep Reddy Ch.
07 Aug 2020
International Journal of Pervasive Computing and Communications | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE journal on emerging and selected topics in circuits and systems