Convolutional Neural Networks (CNN) are used in a range of machine learning tasks, such as voice, image, and video processing. As the demand for faster response times in real-time applications grows, the need for high-speed implementation of CNNs is becoming more significant. However, the convolutional layer of CNNs is computationally demanding, leading to higher delays. Therefore, this study seeks to design an efficient and fast convolution block for the hardware implementation of the CNN algorithm. The proposed solution uses a Bit-Level Multiplier and Accumulator (BLMAC) unit that incorporates a modified Booth Encoder and Wallace Reduction Tree to achieve time optimization. The BLMAC, a key component of the convolution process, is optimized for speed. The area occupied by this architecture is 2761.517μm2, the power consumed is 121.4μW, and the delay is 9.11ns. Hence the proposed architecture is extremely power efficient and other parameters like area and delay have not been sacrificed to achieve this result. The proposed BLMAC architecture is designed using Verilog. The testbench for the code was verified and simulated in Cadence NCSIM. The synthesis is done in the Genus tool offered by Cadence.
Read full abstract