DNN Partitioning for Inference Throughput Acceleration at the Edge

Thomas Feltin,Frank Brockners,Léo Marchó,Juan-Antonio Cordero-Fuertes,Thomas H Clausen

doi:10.1109/access.2023.3244497

Abstract

Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy sensitive deep learning applications cannot afford to offload computation to remote clouds because of the implied transmission cost and lack of trust in third-party cloud providers. Among solutions to increase performance while keeping computation on a constrained environment, hardware acceleration can be onerous, and model optimization requires extensive design efforts while hindering accuracy. DNN partitioning is a third complementary approach, and consists of distributing the inference workload over several available edge devices, taking into account the edge network properties and the DNN structure, with the objective of maximizing the inference throughput (number of inferences per second). This paper introduces a method to predict inference and transmission latencies for multi-threaded distributed DNN deployments, and defines an optimization process to maximize the inference throughput. A branch and bound solver is then presented and analyzed to quantify the achieved performance and complexity. This analysis has led to the definition of the acceleration region, which describes deterministic conditions on the DNN and network properties under which DNN partitioning is beneficial. Finally, experimental results confirm the simulations and show inference throughput improvements in sample edge deployments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2023
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

DNN Partitioning for Inference Throughput Acceleration at the Edge

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Throughput Maximization of Delay-Aware DNN Inference in Edge Computing by Exploring DNN Model Partitioning and Inference Parallelism
Jing Li ... Weifa Liang
IEEE Transactions on Mobile Computing | VOL. 22
Jing Li, et. al.Jing Li ... Weifa Liang
01 May 2023
IEEE Transactions on Mobile Computing | VOL. 22

Delay-Aware DNN Inference Throughput Maximization in Edge Computing via Jointly Exploring Partitioning and Parallelism
Jing Li ... Weifa Liang
-
Jing Li, et. al.Jing Li ... Weifa Liang
04 Oct 2021
04 Oct 2021

Performance Evaluation of State-of-the-Art Edge Computing Devices for DNN Inference
Xalo Rancano ... Roberto Fernandez Molanes
-
Xalo Rancano, et. al.Xalo Rancano ... Roberto Fernandez Molanes
18 Oct 2020
18 Oct 2020

Towards Resource-aware DNN Partitioning for Edge Devices with Heterogeneous Resources
Muhammad Zawish ... Kapal Dev
-
Muhammad Zawish, et. al.Muhammad Zawish ... Kapal Dev
04 Dec 2022
04 Dec 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DNN Partitioning for Inference Throughput Acceleration at the Edge

Abstract

Talk to us

Similar Papers

More From: IEEE Access