Abstract

Convolutional Neural Network (CNN) plays an important role in several machine learning tasks related to speech, image, and video processing applications. The increasing demand for faster processing in real-time applications requires high-speed implementation of CNN. However, in general, CNN involves higher latency due to the computationally intensive behavior of the convolutional layer. While state-of-the-art architecture provides efficient dataflow of the convolutional operations, this paper proposes a hardware-efficient, high-speed convolution block for ASIC implementation of the CNN algorithm. The proposed convolution block is designed using a novel bit-level-multiply-accumulator (BLMAC) with a modified Booth encoder and a Wallace reduction tree. The critical path of the overall architecture is significantly shortened due to the time-optimized implementation of the proposed BLMAC, which is a main component of the convolution process. Critical path analysis and dataflow strategy are also provided to demonstrate the acceleration of the proposed design. The proposed architecture was synthesized using Synopsys Design Compiler to prove its accelerated processing. The ASIC synthesis results of the proposed architecture using a 65nm standard cell library show at least 53% reduction in latency, 52.2% reduction in area-delay product, and 54.2% reduction in power-delay product compared to the state-of-the-art architecture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call