Abstract

Convolutional Neural Network-based (CNN) inference is a demanding computational task where a long sequence of operations is applied to an input as dictated by the network topology. Optimisations by data quantisation, data reuse, network pruning, and dedicated hardware architectures have a strong impact on reducing both energy consumption and hardware resource requirements, and on improving inference latency. Implementing new applications from established models available from both academic and industrial worlds is common nowadays. Further optimisations by preserving model architecture have been proposed via early exiting approaches, where additional exit points are included in order to evaluate classifications of samples that produce feature maps with sufficient evidence to be classified before reaching the final model exit. This paper proposes a methodology for designing early-exit networks from a given baseline model aiming to improve the average latency for a targeted subset class constrained by the original accuracy for all classes. Results demonstrate average time saving in the order of 2.09× to 8.79× for dataset CIFAR10 and 15.00× to 20.71× for CIFAR100 for baseline models ResNet-21, ResNet-110, Inceptionv3-159, and DenseNet-121.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call