Optimizing Job Offloading Schedule for Collaborative DNN Inference

Yubin Duan,Jie Wu

doi:10.1109/tmc.2023.3276937

Abstract

Deep Neural Networks (DNNs) have been widely deployed in mobile applications. DNN inference latency is a critical metric to measure the service quality of those applications. Collaborative inference is a promising approach for latency optimization, where partial inference workloads are offloaded from mobile devices to cloud servers. Model partition problems for collaborative inference have been well studied. However, little attention has been paid to optimizing offloading pipeline for multiple DNN inference jobs. In practice, mobile devices usually need to process multiple DNN inference jobs simultaneously. We propose to jointly optimize the DNN partitioning and pipeline scheduling for multiple inference jobs. We theoretically analyze the optimal scheduling conditions for homogeneous chain-structure DNNs. Based on the analysis, we proposed near-optimal partitioning and scheduling methods for chain-structure DNNs. We also extend those methods for general-structure DNNs. In addition, we extend our problem scenario to handle heterogeneous DNN inference jobs. A layer-level scheduling algorithm is proposed. Theoretical analyses show that our proposed method is optimal when computation graphs are tree-structure. Our joint optimization methods are evaluated in a real-world testbed. Experiment results show that our methods can significantly reduce the overall inference latency of multiple inference jobs compared to partition-only or schedule-only approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimizing Job Offloading Schedule for Collaborative DNN Inference

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Mobile Computing

Lead the way for us

Journal: IEEE Transactions on Mobile Computing	Publication Date: Apr 1, 2024
Citations: 5

Similar Papers

Joint Optimization of the Partition and Scheduling of DNN Tasks in Computing and Network Convergence
Zhenyu Zhang ... Lu Lu
IEEE Networking Letters | VOL. 5
Zhenyu Zhang, et. al.Zhenyu Zhang ... Lu Lu
01 Jun 2023
IEEE Networking Letters | VOL. 5

An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing
Guozhi Liu ... Muhammad Bilal
Future Generation Computer Systems | VOL. 140
Guozhi Liu, et. al.Guozhi Liu ... Muhammad Bilal
04 Nov 2022
Future Generation Computer Systems | VOL. 140

Joint Optimization of DNN Partition and Continuous Task Scheduling for Digital Twin-Aided MEC Network With Deep Reinforcement Learning
Siyu Yuan ... Qin Li
IEEE Access | VOL. 11
Siyu Yuan, et. al.Siyu Yuan ... Qin Li
01 Jan 2023
IEEE Access | VOL. 11

Joint Optimization of DNN Partition and Scheduling for Mobile Cloud Computing
Yubin Duan ... Jie Wu
-
Yubin Duan, et. al.Yubin Duan ... Jie Wu
09 Aug 2021
09 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing Job Offloading Schedule for Collaborative DNN Inference

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Mobile Computing