Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices

Siyu Liao,Bo Yuan,Yanzhi Wang,Zhe Li,Qinru Qiu,Xue Lin

doi:10.1109/iccad.2017.8203813

Abstract

Deep neural networks (DNNs) have emerged as the most powerful machine learning technique in numerous artificial intelligent applications. However, the large sizes of DNNs make themselves both computation and memory intensive, thereby limiting the hardware performance of dedicated DNN accelerators. In this paper, we propose a holistic framework for energy-efficient high-performance highly-compressed DNN hardware design. First, we propose block-circulant matrix-based DNN training and inference schemes, which theoretically guarantee Big-O complexity reduction in both computational cost (from O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) to O(n log n)) and storage requirement (from O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) to O(n)) of DNNs. Second, we dedicatedly optimize the hardware architecture, especially on the key fast Fourier transform (FFT) module, to improve the overall performance in terms of energy efficiency, computation performance and resource cost. Third, we propose a design flow to perform hardware-software co-optimization with the purpose of achieving good balance between test accuracy and hardware performance of DNNs. Based on the proposed design flow, two block-circulant matrix-based DNNs on two different datasets are implemented and evaluated on FPGA. The fixed-point quantization and the proposed block-circulant matrix-based inference scheme enables the network to achieve as high as 3.5 TOPS computation performance and 3.69 TOPS/W energy efficiency while the memory is saved by 108X ~ 116X with negligible accuracy degradation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Structured representation in deep neural network systems
Caiwen Ding
-
Caiwen DingCaiwen Ding
10 May 2021
10 May 2021

C ir CNN
Caiwen Ding ... Youwei Zhuo
-
Caiwen Ding, et. al.Caiwen Ding ... Youwei Zhuo
14 Oct 2017
14 Oct 2017

Embedding error correction into crossbars for reliable matrix vector multiplication using emerging devices
Qiuwen Lou ... Siddharth Joshi
-
Qiuwen Lou, et. al.Qiuwen Lou ... Siddharth Joshi
10 Aug 2020
10 Aug 2020

Stochastic Cumulative DNN Inference With RL-Aided Adaptive IoT Device-Edge Collaboration
Kaige Qu ... Weisen Shi
IEEE Internet of Things Journal | VOL. 10
Kaige Qu, et. al.Kaige Qu ... Weisen Shi
15 Oct 2023
IEEE Internet of Things Journal | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices

Abstract

Talk to us

Similar Papers