A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator

Jiye Huang,Zhijin Zhao,Tongdong Guo,Xin Liu

doi:10.3390/electronics12071571

Jiye Huang, Zhijin Zhao + Show 2 more

Open Access

https://doi.org/10.3390/electronics12071571

Copy DOI

Journal: Electronics	Publication Date: Mar 27, 2023
Citations: 2	License type: CC BY 4.0

Affiliation: Hangzhou Dianzi University

Abstract

Depthwise separable convolution (DSC) significantly reduces parameter and floating operations with an acceptable loss of accuracy and has been widely used in various lightweight convolutional neural network (CNN) models. In practical applications, however, DSC accelerators based on graphics processing units (GPUs) cannot fully exploit the performance of DSC and are unsuitable for mobile application scenarios. Moreover, low resource utilization due to idle engines is a common problem in DSC accelerator design. In this paper, a high-performance DSC hardware accelerator based on field-programmable gate arrays (FPGAs) is proposed. A highly reusable and scalable multiplication and accumulation engine is proposed to improve the utilization of computational resources. An efficient convolution algorithm is proposed for depthwise convolution (DWC) and pointwise convolution (PWC), respectively, to reduce the on-chip memory occupancy. Meanwhile, the proposed convolution algorithms achieve partial fusion between PWC and DWC, and improve the off-chip memory access efficiency. To maximise bandwidth utilization and reduce latency when reading feature maps, an address mapping method for off-chip accesses is proposed. The performance of the proposed accelerator is demonstrated by implementing MobileNetV2 on an Intel Arria 10 GX660 FPGA by using Verilog HDL. The experimental results show that the proposed DSC accelerator achieves a performance of 205.1 FPS, 128.8 GFLOPS, and 0.24 GOPS/DSP for input images of size 224×224×3.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

An Efficient Accelerator with Winograd for Novel Convolutional Neural Networks
Zhijian Lin ... Meng Zhang
-
Zhijian Lin, et. al.Zhijian Lin ... Meng Zhang
13 May 2022
13 May 2022

Xception: Deep Learning with Depthwise Separable Convolutions
Francois Chollet
-
Francois CholletFrancois Chollet
01 Jul 2017
01 Jul 2017

Efficient Inference of Large-Scale and Lightweight Convolutional Neural Networks on FPGA
Xiao Wu ... Zhongfeng Wang
-
Xiao Wu, et. al.Xiao Wu ... Zhongfeng Wang
08 Sep 2020
08 Sep 2020

Making depthwise convolution SR-friendly via kernel attention injection
Seongmin Hwang ... Moongu Jeon
Journal of Visual Communication and Image Representation | VOL. 96
Seongmin Hwang, et. al.Seongmin Hwang ... Moongu Jeon
22 Aug 2023
Journal of Visual Communication and Image Representation | VOL. 96

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator

Abstract

Talk to us

Similar Papers

More From: Electronics