Abstract

This article analyzes the effects of approximate multiplication when performing inferences on deep convolutional neural networks (CNNs). The approximate multiplication can reduce the cost of the underlying circuits so that CNN inferences can be performed more efficiently in hardware accelerators. The study identifies the critical factors in the convolution, fully-connected, and batch normalization layers that allow more accurate CNN predictions despite the errors from approximate multiplication. The same factors also provide an arithmetic explanation of why bfloat16 multiplication performs well on CNNs. The experiments are performed with recognized network architectures to show that the approximate multipliers can produce predictions that are nearly as accurate as the FP32 references, without additional training. For example, the ResNet and Inception-v4 models with Mitch-<inline-formula><tex-math notation="LaTeX">$w$</tex-math><alternatives><mml:math><mml:mi>w</mml:mi></mml:math><inline-graphic xlink:href="kim-ieq1-3050989.gif"/></alternatives></inline-formula>6 multiplication produces Top-5 errors that are within 0.2 percent compared to the FP32 references. A brief cost comparison of Mitch-<inline-formula><tex-math notation="LaTeX">$w$</tex-math><alternatives><mml:math><mml:mi>w</mml:mi></mml:math><inline-graphic xlink:href="kim-ieq2-3050989.gif"/></alternatives></inline-formula>6 against bfloat16 is presented where a MAC operation saves up to 80 percent of energy compared to the bfloat16 arithmetic. The most far-reaching contribution of this article is the analytical justification that multiplications can be approximated while additions need to be exact in CNN MAC operations.

Highlights

  • T HE computational costs of convolutional neural networks (CNNs) have increased as CNNs get wider and deeper to perform better predictions for a variety of applications

  • The convolution layers in CNNs consist of a large number of multiply-accumulate (MAC) operations and they take up the majority of computations for CNN inferences [11]

  • The network dependency is the reason why more complex networks require a higher number of bits and the benefits of aggressive quantization to quantization as approximate multipliers may be designed for any number of bits, and it complements quantization to maximize the computational efficiency of CNN inferences

Read more

Summary

INTRODUCTION

T HE computational costs of convolutional neural networks (CNNs) have increased as CNNs get wider and deeper to perform better predictions for a variety of applications. Some techniques are computationally expensive in order to optimize their methods for each network model, or to retrain networks to compensate for the performance degradation from their methods [5], [6] Many techniques such as [7] are only effective for small networks and cannot scale to deeper CNNs as they report much worse performance results when tested for deeper networks. One promising hardware-based approach is the application of approximate multiplication to CNN inference [9] It involves designing and applying multiplication circuits that have reduced hardware costs but produce results that are not exact. While optimizing CNN inference through approximate multiplication was demonstrated in several previous studies, there was limited understanding of why it worked well for CNNs. The promising results led to the general observation that CNNs were resilient against small arithmetic errors, but none of them identified the complete reason behind that resilience. Discussing the potential cost benefits of the methodology by briefly comparing the hardware costs against those of bfloat arithmetic

PRELIMINARIES
ACCUMULATED ERROR IN CONVOLUTION
Understanding Convolution and FC Layers
Minimized Variance of Error
Impact on Convolution and FC
Grouped and Depthwise Convolutions
EFFECT OF BATCH NORMALIZATION
ARITHMETIC REASON FOR BFLOAT16 SUCCESS
EXPERIMENTS
Impact of Approximate Multiplication on CNNs
Effect of Batch Normalization
COMPARISON OF COSTS
RELATED WORKS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.