Abstract

The large and ever-increasing size of state-of-the-art convolutional neural networks (CNNs) poses both throughput and energy challenges to the underlying hardware, especially in embedded and mobile computing platforms. Weight quantization is a promising method to reduce the computational complexity of neural networks without losing significant accuracy. In this paper, we show that the value locality of quantized filter weights brings about a high amount of computation redundancy in CNNs. In order to leverage this computational redundancy to reduce neural network complexity, this paper proposes CORNC, a computation reuse-aware accelerator for CNNs. CORNC leverages two opportunities to eliminate repetitive computations: (1) the specific algorithmic structure of the convolutional neural networks in which each element of the input data (and intermediate feature maps) is multiplied by tens to thousands of CNN filter weights, and (2) the data value locality in the network filter weights when low-precision quantization is applied. Experimental results show that by eliminating the considerable amount of repetitive multiplication, computation reuse offers up to 10.5%–20.9% energy reduction and 13.4%–41.6% latency reduction over state-of-the-art accelerators across a set of widely-used CNN benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call