Deep neural network (DNN) inference demands substantial computing power, resulting in significant energy consumption. A large number of negative output activations in convolution layers are rendered zero due to the invocation of the ReLU activation function. This results in a substantial number of unnecessary computations that consume significant amounts of energy. This paper presents ECHO, an accelerator for DNN inference designed for computation pruning, utilizing an unconventional arithmetic paradigm known as online/most significant digit first (MSDF) arithmetic, which performs computations in a digit-serial manner. The MSDF digit-serial computation of online arithmetic enables overlapped computation of successive operations, leading to substantial performance improvements. The online arithmetic, coupled with a negative output detection scheme, facilitates early and precise recognition of negative outputs. This, in turn, allows for the timely termination of unnecessary computations, resulting in a reduction in energy consumption. The implemented design has been realized on the Xilinx Virtex-7 VU3P FPGA and subjected to a comprehensive evaluation through a rigorous comparative analysis involving widely used performance metrics. The experimental results demonstrate promising power and performance improvements compared to contemporary methods. In particular, the proposed design achieved average improvements in power consumption of up to 81%, 82.9%, and 40.6% for VGG-16, ResNet-18, and ResNet-50 workloads compared to the conventional bit-serial design, respectively. Furthermore, significant average speedups of 2.39×, 2.6×, and 2.42× were observed when comparing the proposed design to conventional bit-serial designs for the VGG-16, ResNet-18, and ResNet-50 models, respectively.
Read full abstract