An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs.

Yunping Zhao,Xiaowen Chen,Jianzhuang Lu

doi:10.3390/s20195558

Yunping Zhao, Xiaowen Chen + Show 1 more

Open Access

https://doi.org/10.3390/s20195558

Copy DOI

Journal: Sensors	Publication Date: Sep 28, 2020
Citations: 7	License type: CC BY 4.0

Affiliation: National University of Defense Technology

Abstract

Due to the high throughput and high computing capability of convolutional neural networks (CNNs), researchers are paying increasing attention to the design of CNNs hardware accelerator architecture. Accordingly, in this paper, we propose a block parallel computing algorithm based on the matrix transformation computing algorithm (MTCA) to realize the convolution expansion and resolve the block problem of the intermediate matrix. It enables high parallel implementation on hardware. Moreover, we also provide a specific calculation method for the optimal partition of matrix multiplication to optimize performance. In our evaluation, our proposed method saves more than 60% of hardware storage space compared with the im2col(image to column) approach. More specifically, in the case of large-scale convolutions, it saves nearly 82% of storage space. Under the accelerator architecture framework designed in this paper, we realize the performance of 26.7GFLOPS-33.4GFLOPS (depending on convolution type) on FPGA(Field Programmable Gate Array) by reducing bandwidth and improving data reusability. It is 1.2×–4.0× faster than memory-efficient convolution (MEC) and im2col, respectively, and represents an effective solution for a large-scale convolution accelerator.

Highlights

At present, convolutional neural networks (CNNs) are widely used in image classification [1], target recognition [2,3], and semantic segmentation [4,5]
In order to solve this problem, accelerator design schemes based on various platforms are proposed, such as graphics processing unit (GPU) [7], or customized application specific integrated circuits (ASIC) [8,9,10,11], field programmable gate arrays (FPGA), and other hardware to complete the acceleration of CNNs
We propose a CNNs accelerator design using a matrix transformation computing algorithm (MTCA) [15] decomposition algorithm

Summary

Introduction

CNNs are widely used in image classification [1], target recognition [2,3], and semantic segmentation [4,5]. CNNs are essentially composed of a convolution layer, RELU layer In the CNNs model, the convolution layer’s amount of calculations accounts for more than 85% of the total calculation [6], which brings huge workload. The scheme of CNNs based on software cannot meet the current high-speed application requirements. In order to solve this problem, accelerator design schemes based on various platforms are proposed, such as graphics processing unit (GPU) [7], or customized application specific integrated circuits (ASIC) [8,9,10,11], field programmable gate arrays (FPGA), and other hardware to complete the acceleration of CNNs. due to the problems of power consumption, development cost, and cycles, the research and development of GPU and ASIC are largely limited

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

An FPGA Design Framework for CNN Sparsification and Acceleration
Sicheng Li ... Hai Li
-
Sicheng Li, et. al.Sicheng Li ... Hai Li
01 Apr 2017
01 Apr 2017

EPA: The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA
Junjie Zhang ... Weicheng Hu
Concurrency and Computation: Practice and Experience | VOL. 35
Junjie Zhang, et. al.Junjie Zhang ... Weicheng Hu
31 Mar 2021
Concurrency and Computation: Practice and Experience | VOL. 35

RETRACTED ARTICLE: A novel cognitive Wallace compressor based multi operand adders in CNN architecture for FPGA
T Kowsalya
Journal of Ambient Intelligence and Humanized Computing | VOL. 12
T KowsalyaT Kowsalya
07 Aug 2020
Journal of Ambient Intelligence and Humanized Computing | VOL. 12

A Solution to Optimize Multi-Operand Adders in CNN Architecture on FPGA
Fasih Ud Din Farrukh ... Zhihua Wang
-
Fasih Ud Din Farrukh, et. al.Fasih Ud Din Farrukh ... Zhihua Wang
01 May 2019
01 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors