Energy-Efficient Convolutional Neural Network Accelerators for Edge Intelligence

Alessandro Aimar

doi:10.5167/uzh-209482

Abstract

Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many com- puter vision and natural language processing tasks, with applications ranging from automated personal assistants and social network filtering to self-driving cars and drug development. The growth in popularity of these algorithms has its root in the exponential increase of com- puting power available for their training consequent to the diffusion of GPUs. The achieved increase in accuracy created the demand for faster, more power-efficient hardware suited for deployment on edge devices. In this thesis, we propose a set of innovations and tech- nologies belonging to one of the many research lines sparkled by such demand, focusing on energy-efficient hardware for convolutional neural networks. We first study how a standard 28 nm CMOS process performs in the context of deep learning accelerators design, giving special consideration to the power and area of circuits based on standard cells when reduced precision arithmetic and short SRAM memory words are used. The outcome of this analysis indicates how the power-efficiency gain following the reduction of the bit precision is non-linear and how it saturates when using a precision of 16 bits. We propose Nullhop, an accelerator pioneering the use of feature map sparsity, typical of convolutional neural networks, and quantization to boost the hardware capabilities. Nullhop’s novelty is its ability to skip all multiplications including a zero-valued activation. It reaches a power efficiency of 3 TOP/s/W with a throughput of almost 0.5 TOP/s in 6.3 mm2 . We present a neural network quantization algorithm based on a hardware-software co- design approach. We demonstrate its capabilities training several networks on various tasks such as classification, object detection, segmentation, and image generation. The quantization scheme is implemented in Elements, a convolutional neural network accelerator architecture that supports variable weight bit precision as well as sparsity. We demonstrate Elements capabilities with multiple design parameterizations, suited for a wide range of applications. One of these parameterizations, called Deuterium, reaches an energy efficiency of over 4 TOP/s/W using only 3.3 mm2 . We further explore the concept of sparsity with a third convolutional neural network accelerator architecture called TwoNullhop, able to skip over zeros of both feature maps and kernels. We tested the TwoNullhop architecture with Carbon, an accelerator that, despite having only 128 multiply-accumulate units and running at a frequency of only 500 MHz, achieves more than 2.4 TOP/s with an energy efficiency of 10.2 TOP/s/W in only 4 mm2 . The thesis ends with an overview of the challenges and possibilities we foresee in the future of deep learning hardware development, trying to predict what themes are going to dominate the field in the years to come.

Full Text