Efficacy of Topology Scaling for Temperature and Latency Constrained Embedded ConvNets

Valentino Peluso,Roberto Giorgio Rizzo,Andrea Calimera

doi:10.3390/jlpea10010010

Valentino Peluso, Roberto Giorgio Rizzo + Show 1 more

Open Access

https://doi.org/10.3390/jlpea10010010

Copy DOI

Abstract

Embedded Convolutional Neural Networks (ConvNets) are driving the evolution of ubiquitous systems that can sense and understand the environment autonomously. Due to their high complexity, aggressive compression is needed to meet the specifications of portable end-nodes. A variety of algorithmic optimizations are available today, from custom quantization and filter pruning to modular topology scaling, which enable fine-tuning of the hyperparameters and the right balance between quality, performance and resource usage. Nonetheless, the implementation of systems capable of sustaining continuous inference over a long period is still a primary source of concern since the limited thermal design power of general-purpose embedded CPUs prevents execution at maximum speed. Neglecting this aspect may result in substantial mismatches and the violation of the design constraints. The objective of this work was to assess topology scaling as a design knob to control the performance and the thermal stability of inference engines for image classification. To this aim, we built a characterization framework to inspect both the functional (accuracy) and non-functional (latency and temperature) metrics of two ConvNet models, MobileNet and MnasNet, ported onto a commercial low-power CPU, the ARM Cortex-A15. Our investigation reveals that different latency constraints can be met even under continuous inference, yet with a severe accuracy penalty forced by thermal constraints. Moreover, we empirically demonstrate that thermal behavior does not benefit from topology scaling as the on-chip temperature still reaches critical values affecting reliability and user satisfaction.

Highlights

Recent advances in deep learning have enabled the deployment of Convolutional NeuralNetworks (ConvNets) on tiny end-nodes powered by general-purpose cores
Among the many existing ones, in this work we picked two representative examples considered state-of-the-art: (i) MobileNet, which introduces a convolution operator that decouples spatial and cross-channel correlations to reduce the computational complexity of the network; (ii) MnasNet, which is built using an automated neural architecture search based on reinforcement learning to identify the ConvNet topology that achieves the best trade-off between accuracy and inference latency for designing mobile ConvNets
The analysis was conducted for all the pairs (α, ρ) in order to identify the topology configurations that strictly satisfy the target latency latency constraint (Lt) during the whole

Summary

Introduction

Recent advances in deep learning have enabled the deployment of Convolutional NeuralNetworks (ConvNets) on tiny end-nodes powered by general-purpose cores. ConvNets were mainly optimized to improve accuracy This led to deeper and more complex models with increased size [13,14,15]. Among the many existing ones, in this work we picked two representative examples considered state-of-the-art: (i) MobileNet, which introduces a convolution operator that decouples spatial and cross-channel correlations to reduce the computational complexity of the network; (ii) MnasNet, which is built using an automated neural architecture search based on reinforcement learning to identify the ConvNet topology that achieves the best trade-off between accuracy and inference latency for designing mobile ConvNets. MnasNet can be considered the latest evolution, where the design is automated through an intelligent algorithm that improves over handcrafted solutions

Objectives

Methods

Results

Conclusion