Optimized Compression for Implementing Convolutional Neural Networks on FPGA

Min Zhang,Linpeng Li,Hongbo Qin,Wei Zhao,Yan Liu,Hai Wang

doi:10.3390/electronics8030295

Abstract

Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation. To solve this problem, this paper proposes an optimized compression strategy, and realizes an accelerator based on FPGA for CNNs. Firstly, a reversed-pruning strategy is proposed which reduces the number of parameters of AlexNet by a factor of 13× without accuracy loss on the ImageNet dataset. Peak-pruning is further introduced to achieve better compressibility. Moreover, quantization gives another 4× with negligible loss of accuracy. Secondly, an efficient storage technique, which aims for the reduction of the whole overhead cache of the convolutional layer and the fully connected layer, is presented respectively. Finally, the effectiveness of the proposed strategy is verified by an accelerator implemented on a Xilinx ZCU104 evaluation board. By improving existing pruning techniques and the storage format of sparse data, we significantly reduce the size of AlexNet by 28×, from 243 MB to 8.7 MB. In addition, the overall performance of our accelerator achieves 9.73 fps for the compressed AlexNet. Compared with the central processing unit (CPU) and graphics processing unit (GPU) platforms, our implementation achieves 182.3× and 1.1× improvements in latency and throughput, respectively, on the convolutional (CONV) layers of AlexNet, with an 822.0× and 15.8× improvement for energy efficiency, separately. This novel compression strategy provides a reference for other neural network applications, including CNNs, long short-term memory (LSTM), and recurrent neural networks (RNNs).

Highlights

Deep convolutional neural networks (DCNNs) [1] have shown significant advantages in many artificial intelligence (AI) applications, such as computer vision and natural language processing [2,3,4].The performance of the DCNN is improving rapidly: the winner of ImageNet classification has promoted the top-1 classification accuracy from 57.2% in 2012 (AlexNet) to 76.1% in 2015(ResNet-152) [5,6]
After thorough investigation of the difference and between the convolutional layer and the fully connected layer, we proposed a reversed-pruning and connection between the convolutional layer and the fully connected layer, we proposed a reversedpeak-pruning strategy to reduce the number of weights
Thebenefitting convolverfrom accomplished window convolution operation, which was essentially the efficientastorage approach we proposed in model compression

Summary

Introduction

Deep convolutional neural networks (DCNNs) [1] have shown significant advantages in many artificial intelligence (AI) applications, such as computer vision and natural language processing [2,3,4]. Different from the previous approaches, Han presents the “deep compression” and “efficient speech recognition engine” (ESE) to support sparse recurrent neural networks (RNNs) and long short-term memory networks (LSTMs) [15,16,17], and provides the “efficient inference engine” (EIE) to perform inference on the compressed DNNs [18] These software-hardware co-designs show great advantages in accelerating deep learning, but there is still a lack of analysis on the connection between the fully connected layer and the convolutional layer, leaving plenty of room for algorithm optimization. A compressed CNN model requires less computation and memory, indicating a great potential to improve speed and energy efficiency.

Motivation for Compressing CNNs

Network

Model Compression

Reversed-Pruning

The pruning order forfor reversed-pruning and

It should

61 M 17 M

Data Quantization

Efficient Storage

Hardware

Overall Architecture

Hardware-PE Architecture

11. Sparse matrix

Performance Analysis

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Mar 6, 2019
Citations: 54	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Optimized Compression for Implementing Convolutional Neural Networks on FPGA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

A Solution to Optimize Multi-Operand Adders in CNN Architecture on FPGA
Fasih Ud Din Farrukh ... Tuo Xie
-
Fasih Ud Din Farrukh, et. al.Fasih Ud Din Farrukh ... Tuo Xie
01 May 2019
01 May 2019

Modern deep learning in bioinformatics.
Haoyang Li ... Qiming Fang
Journal of molecular cell biology | VOL. 12
Haoyang Li, et. al.Haoyang Li ... Qiming Fang
23 Jun 2020
Journal of molecular cell biology | VOL. 12

RETRACTED ARTICLE: A novel cognitive Wallace compressor based multi operand adders in CNN architecture for FPGA
T Kowsalya
Journal of Ambient Intelligence and Humanized Computing | VOL. 12
T KowsalyaT Kowsalya
07 Aug 2020
Journal of Ambient Intelligence and Humanized Computing | VOL. 12

SPARCNet
Adam Page ... Ali Jafari
ACM journal on emerging technologies in computing systems | VOL. 13
Adam Page, et. al.Adam Page ... Ali Jafari
12 May 2017
ACM journal on emerging technologies in computing systems | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimized Compression for Implementing Convolutional Neural Networks on FPGA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics