Abstract

The energy efficiency of CNN-based inference engines predominately depends upon Giga-operations-per-second and power consumption. The sparse-based accelerator compresses the insignificant inputs (input feature maps & weights), skips the inefficient computations, and improves energy efficiency. A sparse accelerator for weights could impact the accuracy of the inference. Therefore, a sparse network for Input Feature Maps (IFMs) is considered. MATLAB-based sparsity analysis is done layer-wise on the pre-trained CNN models like AlexNet, VGG-16, VGG-19, ResNet-18 & ResNet-34. Layer-wise analysis reveals that ∼18%–90% of the IFMs are zeros. Besides, IFMs and Weights adopted the 16-bit Fix/Float data format to maintain an accuracy as close as 97% with Single Precision Floating Point (SPFP). A 3 × 1 Convolutional array with improved Zero-detect-Skip (CZS3×1) control units for multiplier and adder/subtractor arrays is proposed. The modified rectified linear unit (RELU) converts IFM values ≤ 0 to zero and sets Detection-Bit (DB) to 1. These DBs decide the mode of effective zero-skip operations in CZS3×1. A 3 × 3 Compressed Processing Element (CPE) is designed using three CZS3×1. The 20-CPEs convolution array architecture is implemented in 65 nm technology libraries. The performance of 90 Giga-operations per second (GOP/s) and energy efficiency of 3.42 Tera operations per second per watt (TOPS/W) were attained for the proposed CPE. The CPE with improved control strategy enhanced the performance by a factor of 2.45 while consuming 8.8 times less energy on average than the state-of-the-art CNN accelerators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call