A Strategy to Accelerate the Inference of a Complex Deep Neural Network

P Haseena Rahmath,Kuldeep Chaurasia,Vishal Srivastava

doi:10.1007/978-981-19-7615-5_5

Abstract

Deep learning is an effective ML algorithm capable of learning and extracting deep representation of data with utmost accuracy. The outstanding performance of deep learning models comes with a series of network layers that demand high computational energy and add latency overhead to the system. Inference of a deep neural network (DNN) completes and delivers output after processing all the network layers irrespective of the input pattern. The complexity of the deep neural network prohibits its usage in energy-constrained low latency real-time applications. A possible solution is multi-exit neural networks that introduce multiple exit branches to the standard neural networks. These early exit neural networks deliver output from their intermediate layers through exit points based on specific confidence criteria. The majority of the input sample can be processed at the initial layers of the network, while more complex input samples can be forwarded further for processing to the remaining layers of the network. This paper analyzes the performance of early exit deep neural networks against their confidence criteria and the number of branches. This study also evaluates the classification accuracy among exit branches. For analysis, implements an object detection application using the early exit MobiletNetV2 neural network and Caltech-256 as datasets. The experiments prove that early exit DNN can speed up the inference process with acceptable accuracy, and the selection of confidence criteria has a significant impact on the system performance.

Full Text