Abstract

Convolution operations have a significant influence on the overall performance of a convolutional neural network, especially in edge-computing hardware design. In this paper, we propose a low-power signed convolver hardware architecture that is well suited for low-power edge computing. The basic idea of the proposed convolver design is to combine all multipliers’ final additions and their corresponding adder tree to form a partial product matrix (PPM) and then to use the reduction tree algorithm to reduce this PPM. As a result, compared with the state-of-the-art approach, our convolver design not only saves a lot of carry propagation adders but also saves one clock cycle per convolution operation. Moreover, the proposed convolver design can be adapted for different dataflows (including input stationary dataflow, weight stationary dataflow, and output stationary dataflow). According to dataflows, two types of convolve-accumulate units are proposed to perform the accumulation of convolution results. The results show that, compared with the state-of-the-art approach, the proposed convolver design can save 15.6% power consumption. Furthermore, compared with the state-of-the-art approach, on average, the proposed convolve-accumulate units can reduce 15.7% power consumption.

Highlights

  • Since AlexNet achieved outstanding achievements in the ImageNet Large-Scale VisualRecognition Challenge (ILSVRC), a lot of research teams have been devoted to the development of convolutional neural networks (CNNs) with well-known research advances such as ZFNet, GoogleNet, VGG, ResNet, etc

  • According to dataflows described in [20], in this paper, we present two types of convolve-accumulate units to perform the accumulation of convolution results

  • We find that the proposed convolver design and the proposed convolve-accumulate units can save both circuit area and power consumption

Read more

Summary

Introduction

Since AlexNet achieved outstanding achievements in the ImageNet Large-Scale Visual. Recognition Challenge (ILSVRC), a lot of research teams have been devoted to the development of convolutional neural networks (CNNs) with well-known research advances such as ZFNet, GoogleNet, VGG, ResNet, etc. Compared with the state-ofthe-art approach [24], the proposed approach can save one clock cycle (for performing final additions of multipliers and additions of adder tree) per convolution operation. In the CNN accelerator, a convolve-accumulate unit is required to add up convolution results (from different channels). The proposed convolve-accumulate units are the first to deal with the optimization of underlying hardware circuit for the accumulation of convolution results. In the original PE designs of these CNN accelerators [7,9,20,21,22], their convolution operations are performed by multipliers and adders. The proposed approach is the first work to discuss the optimization of underlying hardware circuit for the accumulation of convolution results.

Motivation
Proposed Convolver Architecture
Proposed Convolve-Accumulate Units
Experimental Result
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call