Area-Efficient Parallel Multiplication Units for CNN Accelerators With Output Channel Parallelization

Song-Nien Tang

doi:10.1109/tvlsi.2023.3235776

Abstract

Many existing studies on accelerating convolutional neural networks (CNNs) use parallel data operation schemes to increase the throughput. This study proposes area-efficient parallel multiplication unit (PMU) designs for a CNN accelerator that uses parallelization on the output channels of a CNN layer, which parallel multiplies a common feature map pixel with multiple CNN kernel weights. First, tailored PMU designs are proposed for CNNs with specific low-precision 3-to-8-bit weights. Second, the proposed 5-to-8-bit PMU designs are extended with two-clock-cycle operations to develop PMUs for weight precision scalable to 10/12/14/16 bits. Compared to 16-path PMUs directly using carry-save-adder array multipliers, our PMU designs can achieve the area reductions of 28.19%−56.09% and 22.10%−30.71% for 3–8 bit and 10-/12-/14-/ 16-bit weights, respectively. Moreover, a resultant 16-path 16-bit weight PMU is verified through the system-on-chip (SoC) field-programmable gate array (FPGA) implementation to demonstrate the CNN inference.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Area-Efficient Parallel Multiplication Units for CNN Accelerators With Output Channel Parallelization

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Lead the way for us

Journal: IEEE Transactions on Very Large Scale Integration (VLSI) Systems	Publication Date: Mar 1, 2023
Citations: 4

Similar Papers

FPGA Implementation of Convolutional Neural Network for Defect Identification on Swiven Cap
Ngei Siong Wong ... Muhammad Firdaus Akbar
-
Ngei Siong Wong, et. al.Ngei Siong Wong ... Muhammad Firdaus Akbar
01 Jan 2021
01 Jan 2021

FPGA implementation of AAD pooling unit and performance analysis
Rajamahanti Meher Kiran ... Toram Naga Jahnavi
World Journal of Advanced Research and Reviews | VOL. 16
Rajamahanti Meher Kiran, et. al. Rajamahanti Meher Kiran ... Toram Naga Jahnavi
30 Oct 2022
World Journal of Advanced Research and Reviews | VOL. 16

Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation
Bradley Mcdanel ... H T Kung
-
Bradley Mcdanel, et. al.Bradley Mcdanel ... H T Kung
26 Jun 2019
26 Jun 2019

A Hardware-Oriented Dropout Algorithm for Efficient FPGA Implementation
Yoeng Jye Yeoh ... Takashi Morie
-
Yoeng Jye Yeoh, et. al.Yoeng Jye Yeoh ... Takashi Morie
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Area-Efficient Parallel Multiplication Units for CNN Accelerators With Output Channel Parallelization

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Very Large Scale Integration (VLSI) Systems