Delay-Aware DNN Inference Throughput Maximization in Edge Computing via Jointly Exploring Partitioning and Parallelism

Jing Li,Xiaohua Jia,Zichuan Xu,Yuchen Li,Weifa Liang

doi:10.1109/lcn52139.2021.9524928

Abstract

Mobile Edge Computing (MEC) has emerged as a promising paradigm catering to overwhelming explosions of mobile applications, by offloading the compute-intensive tasks to an MEC network for processing. The surging of deep learning brings new vigor and vitality to shape the prospect of intelligent Internet of Things (IoT), and edge intelligence arises to provision real-time deep neural network (DNN) inference services for users. To accelerate the processing of the DNN inference of a request in an MEC network, the DNN inference model usually can be partitioned into two connected parts: one part is processed on the local IoT device of the request; and another part is processed on a cloudlet (server) in the MEC network. Also, the DNN inference can be further accelerated by allocating multiple threads of the cloudlet in which the request is assigned.In this paper, we study a novel delay-aware DNN inference throughput maximization problem with the aim to maximize the number of delay-aware DNN service requests admitted, by accelerating each DNN inference through jointly exploring DNN model partitioning and multi-thread parallelism of DNN inference. To this end, we first show that the problem is NP-hard. We then devise a constant approximation algorithm for it. We finally evaluate the performance of the proposed algorithm through experimental simulations. Experimental results demonstrate that the proposed algorithm is promising.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Delay-Aware DNN Inference Throughput Maximization in Edge Computing via Jointly Exploring Partitioning and Parallelism

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Throughput Maximization of Delay-Aware DNN Inference in Edge Computing by Exploring DNN Model Partitioning and Inference Parallelism
Jing Li ... Weifa Liang
IEEE Transactions on Mobile Computing | VOL. 22
Jing Li, et. al.Jing Li ... Weifa Liang
01 May 2023
IEEE Transactions on Mobile Computing | VOL. 22

Joint Optimization of DNN Partition and Continuous Task Scheduling for Digital Twin-Aided MEC Network With Deep Reinforcement Learning
Siyu Yuan ... Qin Li
IEEE Access | VOL. 11
Siyu Yuan, et. al.Siyu Yuan ... Qin Li
01 Jan 2023
IEEE Access | VOL. 11

IGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud
Fei Xu ... Ruitao Shang
IEEE Transactions on Parallel and Distributed Systems | VOL. 34
Fei Xu, et. al.Fei Xu ... Ruitao Shang
01 Mar 2023
IEEE Transactions on Parallel and Distributed Systems | VOL. 34

Deep Reinforcement Learning Based Resource Management for DNN Inference in Industrial IoT
Weiting Zhang ... Hongke Zhang
IEEE Transactions on Vehicular Technology | VOL. 70
Weiting Zhang, et. al.Weiting Zhang ... Hongke Zhang
24 Mar 2021
IEEE Transactions on Vehicular Technology | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Delay-Aware DNN Inference Throughput Maximization in Edge Computing via Jointly Exploring Partitioning and Parallelism

Abstract

Talk to us

Similar Papers