Chapter 4 - Efficient methods for deep learning

Han Cai,Ji Lin,Song Han

doi:10.1016/b978-0-12-822109-9.00013-8

Abstract

Deep neural networks (DNNs) have achieved unprecedented success in computer vision. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that can lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand to enable numerous edge AI applications. This chapter provides an overview of efficient deep learning methods. We start from introducing popular model compression methods, including pruning, factorization, and quantization. We then describe compact model design techniques including efficient convolution layers and representative efficient CNN architectures. Finally, to reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization.

Full Text