A Power-Efficient Accelerator for Convolutional Neural Networks

Fan Sun,Yuntao Lu,Chao Wang,Chongchong Xu,Xuehai Zhou,Yiwei Zhang,Lei Gong,Xi Li

doi:10.1109/cluster.2017.47

Abstract

Convolutional neural networks(CNNs) have been widely applied in various applications. However, the computation-intensive convolutional layers and memory-intensive fully connected layers have brought many challenges to the implementation of CNN on embedded platforms. To overcome this problem, this work proposes a power-efficient accelerator for CNNs, and different methods are applied to optimize the convolutional layers and fully connected layers. For the convolutional layer, the accelerator first rearranges the input features into matrix on-the-fly when storing them to the on-chip buffers. Thus the computation of convolutional layer can be completed through matrix multiplication. For the fully connected layer, the batch-based method is used to reduce the required memory bandwidth, which also can be completed through matrix multiplication. Then a two-layer pipelined computation method for matrix multiplication is proposed to increase the throughput. As a case study, we implement a widely used CNN model, LeNet-5, on an embedded device. It can achieve a peak performance of 34.48 GOP/s and the power efficiency with the value of 19.45 GOP/s/W under 100MHz clock frequency which outperforms previous approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Power-Efficient Accelerator for Convolutional Neural Networks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs
Fan Sun ... Xi Li
International Journal of Parallel Programming | VOL. 46
Fan Sun, et. al.Fan Sun ... Xi Li
27 Sep 2017
International Journal of Parallel Programming | VOL. 46

Helix Matrix Transformation Combined With Convolutional Neural Network Algorithm for Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry-Based Bacterial Identification.
Jin Ling ... Yufei Song
Frontiers in microbiology | VOL. 11
Jin Ling, et. al.Jin Ling ... Yufei Song
12 Nov 2020
Frontiers in microbiology | VOL. 11

Prediction of Diabetic Retinopathy using Deep Learning with Preprocessing
S Balaji ... D Gokulakrishnan
EAI Endorsed Transactions on Pervasive Health and Technology | VOL. 10
S Balaji, et. al.S Balaji ... D Gokulakrishnan
22 Feb 2024
EAI Endorsed Transactions on Pervasive Health and Technology | VOL. 10

Bi-stream CNN Down Syndrome screening model based on genotyping array
Bing Feng ... William Hoskins
BMC Medical Genomics | VOL. 11
Bing Feng, et. al.Bing Feng ... William Hoskins
01 Nov 2018
BMC Medical Genomics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Power-Efficient Accelerator for Convolutional Neural Networks

Abstract

Talk to us

Similar Papers