Convolver Design and Convolve-Accumulate Unit Design for Low-Power Edge Computing

Hsu-Yu Kao,Xin-Jia Chen,Shih-Hsu Huang

doi:10.3390/s21155081

Hsu-Yu Kao, Xin-Jia Chen + Show 1 more

Open Access

https://doi.org/10.3390/s21155081

Copy DOI

Journal: Sensors (Basel, Switzerland)	Publication Date: Jul 27, 2021
Citations: 1	License type: CC BY 4.0

Affiliation: Chung Yuan Christian University

Abstract

Convolution operations have a significant influence on the overall performance of a convolutional neural network, especially in edge-computing hardware design. In this paper, we propose a low-power signed convolver hardware architecture that is well suited for low-power edge computing. The basic idea of the proposed convolver design is to combine all multipliers’ final additions and their corresponding adder tree to form a partial product matrix (PPM) and then to use the reduction tree algorithm to reduce this PPM. As a result, compared with the state-of-the-art approach, our convolver design not only saves a lot of carry propagation adders but also saves one clock cycle per convolution operation. Moreover, the proposed convolver design can be adapted for different dataflows (including input stationary dataflow, weight stationary dataflow, and output stationary dataflow). According to dataflows, two types of convolve-accumulate units are proposed to perform the accumulation of convolution results. The results show that, compared with the state-of-the-art approach, the proposed convolver design can save 15.6% power consumption. Furthermore, compared with the state-of-the-art approach, on average, the proposed convolve-accumulate units can reduce 15.7% power consumption.

Highlights

Since AlexNet achieved outstanding achievements in the ImageNet Large-Scale VisualRecognition Challenge (ILSVRC), a lot of research teams have been devoted to the development of convolutional neural networks (CNNs) with well-known research advances such as ZFNet, GoogleNet, VGG, ResNet, etc
According to dataflows described in [20], in this paper, we present two types of convolve-accumulate units to perform the accumulation of convolution results
We find that the proposed convolver design and the proposed convolve-accumulate units can save both circuit area and power consumption

Summary

Introduction

Since AlexNet achieved outstanding achievements in the ImageNet Large-Scale Visual. Recognition Challenge (ILSVRC), a lot of research teams have been devoted to the development of convolutional neural networks (CNNs) with well-known research advances such as ZFNet, GoogleNet, VGG, ResNet, etc. Compared with the state-ofthe-art approach [24], the proposed approach can save one clock cycle (for performing final additions of multipliers and additions of adder tree) per convolution operation. In the CNN accelerator, a convolve-accumulate unit is required to add up convolution results (from different channels). The proposed convolve-accumulate units are the first to deal with the optimization of underlying hardware circuit for the accumulation of convolution results. In the original PE designs of these CNN accelerators [7,9,20,21,22], their convolution operations are performed by multipliers and adders. The proposed approach is the first work to discuss the optimization of underlying hardware circuit for the accumulation of convolution results.

Motivation

Proposed Convolver Architecture

Proposed Convolve-Accumulate Units

Experimental Result

Findings

Conclusions