Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

Thaha Mohammed,Carlee Joe-Wong,Rohit Babbar,Mario Di Francesco

doi:10.1109/infocom41043.2020.9155237

Abstract

Deep neural networks (DNN) are the de-facto solution behind many intelligent applications of today, ranging from machine translation to autonomous driving. DNNs are accurate but resource-intensive, especially for embedded devices such as mobile phones and smart objects in the Internet of Things. To overcome the related resource constraints, DNN inference is generally offloaded to the edge or to the cloud. This is accomplished by partitioning the DNN and distributing computations at the two different ends. However, most of existing solutions simply split the DNN into two parts, one running locally or at the edge, and the other one in the cloud. In contrast, this article proposes a technique to divide a DNN in multiple partitions that can be processed locally by end devices or offloaded to one or multiple powerful nodes, such as in fog networks. The proposed scheme includes both an adaptive DNN partitioning scheme and a distributed algorithm to offload computations based on a matching game approach. Results obtained by using a self-driving car dataset and several DNN benchmarks show that the proposed solution significantly reduces the total latency for DNN inference compared to other distributed approaches and is 2.6 to 4.2 times faster than the state of the art.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Joint Optimization of DNN Partition and Continuous Task Scheduling for Digital Twin-Aided MEC Network With Deep Reinforcement Learning
Siyu Yuan ... Qin Li
IEEE Access | VOL. 11
Siyu Yuan, et. al.Siyu Yuan ... Qin Li
01 Jan 2023
IEEE Access | VOL. 11

An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing
Guozhi Liu ... Muhammad Bilal
Future Generation Computer Systems | VOL. 140
Guozhi Liu, et. al.Guozhi Liu ... Muhammad Bilal
04 Nov 2022
Future Generation Computer Systems | VOL. 140

PArtNNer: Platform-Agnostic Adaptive Edge-Cloud DNN Partitioning for Minimizing End-to-End Latency
Soumendu Kumar Ghosh ... Anand Raghunathan
ACM Transactions on Embedded Computing Systems | VOL. 23
Soumendu Kumar Ghosh, et. al.Soumendu Kumar Ghosh ... Anand Raghunathan
10 Jan 2024
ACM Transactions on Embedded Computing Systems | VOL. 23

DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing
Chao Li ... Yang Xu
-
Chao Li, et. al.Chao Li ... Yang Xu
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

Abstract

Talk to us

Similar Papers