Prototyping of Low-Cost Configurable Sparse Neural Processing Unit with Buffer and Mixed-Precision Reshapeable MAC Array

Binyi Wu,Wolfgang Furtner,Christian Mayr,Bernd Waschneck

doi:10.1109/icpads56603.2022.00098

Abstract

More recently, it has become possible to run deep learning algorithms on edge devices such as microcontrollers due to continuous improvements in neural network optimization algorithms such as quantization and neural architecture search. Nonetheless, most of the embedded hardware available today still falls short of the requirements of running deep neural networks. As a result, specialized processors have emerged to improve the inference efficiency of deep learning algorithms. However, most are not for edge applications that require efficient and low-cost hardware. Therefore, we design and prototype a low-cost configurable sparse Neural Processing Unit (NPU). The NPU has a built-in buffer and a reshapable mixed-precision multiply-accumulator (MAC) array. The computing and memory resources of the NPU are parameterized, and different NPUs can be derived. Besides, users can also conFigure the NPU at runtime to fully utilize the resources. In our experiments, the 200MHz NPU with only 32 MACs is more than 32 times faster than the 400MHzSTM32H7 when inferring MobileNet-Vl. Besides, the yielded NPUs can achieve roofline or even beyond roofline performance. The buffer and reshapeable MAC array push the NPU’s attainable performance to the roofline, while the feature of supporting sparsity allows the NPU to obtain performance beyond the roofline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prototyping of Low-Cost Configurable Sparse Neural Processing Unit with Buffer and Mixed-Precision Reshapeable MAC Array

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Accelerating applications using edge tensor processing units
Kuan-Chieh Hsu ... Hung-Wei Tseng
-
Kuan-Chieh Hsu, et. al.Kuan-Chieh Hsu ... Hung-Wei Tseng
13 Nov 2021
13 Nov 2021

NNBench-X
Xinfeng Xie ... Xing Hu
ACM Transactions on Architecture and Code Optimization | VOL. 17
Xinfeng Xie, et. al.Xinfeng Xie ... Xing Hu
10 Nov 2020
ACM Transactions on Architecture and Code Optimization | VOL. 17

Retraining-Based Timing Error Mitigation for Hardware Neural Networks
Jiachao Deng ... Chengyong Wu
-
Jiachao Deng, et. al.Jiachao Deng ... Chengyong Wu
01 Jan 2015
01 Jan 2015

Retraining-based timing error mitigation for hardware neural networks
...
-
, et. al. ...
09 Mar 2015
09 Mar 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prototyping of Low-Cost Configurable Sparse Neural Processing Unit with Buffer and Mixed-Precision Reshapeable MAC Array

Abstract

Talk to us

Similar Papers