Computation Offloading Scheduling for Deep Neural Network Inference in Mobile Computing

Yubin Duan,Jie Wu

doi:10.1109/iwqos52092.2021.9521304

Abstract

The quality of service (QoS) of intelligent applications on mobile devices heavily depends on the inference speed of Deep Neural Network (DNN) models. Cooperative DNN inference has become an efficient way to reduce inference latency. In cooperative inference, a mobile device offloads a part of its inference task to cloud servers. The large communication volume usually is the bottleneck of such systems. Priory research focuses on reducing the communication volume by finding optimal partition points. We notice that the computation and communication resources on mobile devices can work in pipeline, which can hide the communication time behind computation and further reduce the inference latency. Based on the observation, we formulate the offloading pipeline scheduling problem. We aim to find the optimal sequence of DNN execution and offloading for mobile devices such that the inference latency is minimized. If we use a directed acyclic graph (DAG) to model a DNN, the complex precedence constraints in DAGs bring challenges to our problem. Notice that most DNN models have independent paths or tree structures, we present an optimal path-wise DAG scheduler and an optimal layer-wise scheduler for tree-structure DAGs. Then, we proposed a heuristic based on topological sort to schedule general-structure DAGs. The prototype of our offloading scheme is implemented on a real-world testbed, where we use Raspberry Pi as the mobile device and lab PCs as the cloud. Various DNN models are tested and our scheme can reduce their inference latencies in different network environments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Computation Offloading Scheduling for Deep Neural Network Inference in Mobile Computing

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

An Empirical Study of the Impact of Hyperparameter Tuning and Model Optimization on the Performance Properties of Deep Neural Networks
Lizhi Liao ... Weiyi Shang
ACM Transactions on Software Engineering and Methodology | VOL. 31
Lizhi Liao, et. al.Lizhi Liao ... Weiyi Shang
09 Apr 2022
ACM Transactions on Software Engineering and Methodology | VOL. 31

Towards Real-time Cooperative Deep Inference over the Cloud and Edge End Devices
Shigeng Zhang ... Weiping Wang
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies | VOL. 4
Shigeng Zhang, et. al.Shigeng Zhang ... Weiping Wang
15 Jun 2020
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies | VOL. 4

Trusted-DNN: A TrustZone-based Adaptive Isolation Strategy for Deep Neural Networks
Zhuang Liu ... Zhaolong Jian
-
Zhuang Liu, et. al.Zhuang Liu ... Zhaolong Jian
30 Jul 2021
30 Jul 2021

Latency-driven Model Placement for Efficient Edge Intelligence Service
Peiying Lin ... Kenli Li
-
Peiying Lin, et. al.Peiying Lin ... Kenli Li
01 Jul 2022
01 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computation Offloading Scheduling for Deep Neural Network Inference in Mobile Computing

Abstract

Talk to us

Similar Papers