Server load and network-aware adaptive deep learning inference offloading for edge platforms

Jungmo Ahn,Youngki Lee,Jeongseob Ahn,Jeonggil Ko

doi:10.1016/j.iot.2022.100644

Abstract

This work presents DIAMOND, a deep neural network computation offloading scheme consisting of a lightweight client-to-server latency profiling component combined with a server inference time estimation module to accurately assess the expected latency of a deep learning model inference. Latency predictions for both the network and server are comprehensively used to make dynamic (partial) model offloading decisions at the client in run-time. Compared to previous work, DIAMOND targets to minimize network latency estimation overhead and considers the concurrent processing nature of state-of-the-art deep learning inference server designs. Our extensive evaluations with an NVIDIA Jetson Nano client connected to an NVIDIA Triton server shows that DIAMOND completes inference operations with noticeably reduced computational/energy overhead and latency compared to previously proposed model offloading approaches. Furthermore, our results show that DIAMOND well-adapts to practical server load and network dynamics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Server load and network-aware adaptive deep learning inference offloading for edge platforms

Abstract

Talk to us

Similar Papers

More From: Internet of Things

Lead the way for us

Journal: Internet of Things	Publication Date: Nov 25, 2022
Citations: 1

Similar Papers

Characterizing the Performance of Deep Learning Inference for Edge Video Analytics
Zhenxiao Luo ... Yuxiao Zhang
-
Zhenxiao Luo, et. al.Zhenxiao Luo ... Yuxiao Zhang
01 Jan 2021
01 Jan 2021

Sectum: Accurate Latency Prediction for TEE-hosted Deep Learning Inference
Yan Li ... Donggang Cao
-
Yan Li, et. al.Yan Li ... Donggang Cao
01 Jul 2022
01 Jul 2022

Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
Jinwoo Jeong ... Seungsu Baek
-
Jinwoo Jeong, et. al.Jinwoo Jeong ... Seungsu Baek
08 May 2023
08 May 2023

Efficient Architecture Paradigm for Deep Learning Inference as a Service
Jin Yu ... Xiaopeng Ke
-
Jin Yu, et. al.Jin Yu ... Xiaopeng Ke
06 Nov 2020
06 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Server load and network-aware adaptive deep learning inference offloading for edge platforms

Abstract

Talk to us

Similar Papers

More From: Internet of Things