Abstract

Motivated by the prospects of 5G communications and industrial Internet of Things (IoT), recent years have seen the rise of a new computing paradigm, edge computing, which shifts data analytics to network edges that are at the proximity of big data sources. Although deep neural networks (DNNs) have been extensively used in many platforms and scenarios, they are usually both compute and memory intensive, thus, difficult to be deployed on resource-limited edge devices and in performance-demanding edge applications. Hence, there is an urgent need for techniques that enable DNN models to fit into edge devices, while ensuring acceptable execution costs and inference accuracy. This article proposes an on-demand DNN model inference system for industrial edge devices, called knowledge distillation and early exit on edge (EdgeKE). It focuses on the following two design knobs: first, DNN compression based on knowledge distillation, which trains the compact edge models under the supervision of large complex models for improving accuracy and speed; second, DNN acceleration based on early exit, which provides flexible choices for satisfying distinct latency or accuracy requirements from edge applications. By extensive evaluations on the CIFAR100 dataset and across three state-of-art edge devices, experimental results demonstrate that EdgeKE significantly outperforms the baseline models in terms of inference latency and memory footprint, while maintaining competitive classification accuracy. Furthermore, EdgeKE is verified to be efficiently adaptive to the application requirements on the inference performance. The accuracy loss is within 4.84% under various latency constraints, and the speedup ratio is up to 3.30× under various accuracy requirements.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call