Latency-driven Model Placement for Efficient Edge Intelligence Service

Peiying Lin,Cen Chen,Zhichen Shi,Zheng Xiao,Kenli Li

doi:10.1109/services55459.2022.00028

Abstract

Deep learning services are extensively required and have powerful expected effects in a wide range of applications, such as auto-self driving, voice assistant and so on.Traditionally, deep learning services are mainly provided based on cloud computing, referred to as cloud intelligence services, where the deep learning model is deployed in the cloud, and endusers need to upload data through wireless and core network when requesting training and inference services. However, the cloud computing-based deep learning services have deficiencies in latency, privacy, etc. For example, the deep learning service users do not want their private data to leak, and the privacy problem is difficult to solve when the data is uploading to the cloud. With the widespread use of the Internet of Things (IoT) and the rapid development of mobile devices such as smartphones and IoT sensors, large amounts of data need to be used for a variety of real-time deep learning services, such as target recognition and voice recognition for smart cities, smart medical care, and the Internet of Vehicles (IoVs). With a certain network bandwidth, a large amount of data uploaded to the cloud will cause network congestion and greatly increase the response time. To meet the requirements of low latency, researchers have begun to consider the deployment of deep learning services in edges, i.e., edge intelligence service.In edge intelligence services, the computation capability and memory of processors (or devices) are different from a large. At the same time, the requirement of memory size of deep neural network (DNN) models is increasing, such as the memory usage for Alexnet and Resnet are 2.12G and 16.20G separately. Also, in DNN model design, the branches are becoming common, which brings the parallelism. Deploying deep learning models on multiple processors can support the large-scale DNN models and the parallel implementation of DNN model, where the computation of a deep learning model can be conducted in parallel is a possible solution to improve the efficiency of edge intelligence services. The key point in edge intelligence services is how to partition and assign the implementation of the DNN model.In this paper, we propose a novel latency-driven deep learning model placement method for efficient edge intelligence service. Model placement contains two procedures: model partition and sub-models assignment. In our method, we first convert a DNN model into an execution graph, which is a directed acyclic graph (DAG), and propose a novel latency-driven multilevel graph partition for the model. Then the partitioned submodels are heuristically assigned to available processors. To the best of our knowledge, it is the first work that proposes latency-driven graph partition algorithms for model placement. Extensive experiments on several commonly used DNN models and synthetic datasets show that our method can achieve the lowest execution latency with low complexity compared with other state-of-the-art model placement methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Latency-driven Model Placement for Efficient Edge Intelligence Service

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Explainable artificial intelligence for intrusion detection in IoT networks: A deep learning based approach
Bhawana Sharma ... Satyabrata Roy
Expert Systems with Applications | VOL. 238
Bhawana Sharma, et. al.Bhawana Sharma ... Satyabrata Roy
25 Sep 2023
Expert Systems with Applications | VOL. 238

Protecting Intellectual Property of Deep Neural Networks with Watermarking
Jialong Zhang ... Zhongshu Gu
-
Jialong Zhang, et. al.Jialong Zhang ... Zhongshu Gu
29 May 2018
29 May 2018

Which deep learning model can best explain object representations of within-category exemplars?
Dongha Lee
Journal of vision | VOL. 21
Dongha LeeDongha Lee
14 Sep 2021
Journal of vision | VOL. 21

Split computing: DNN inference partition with load balancing in IoT-edge platform for beyond 5G
Jyotirmoy Karjee ... Vanamala N Bhargav
Measurement: Sensors | VOL. 23
Jyotirmoy Karjee, et. al.Jyotirmoy Karjee ... Vanamala N Bhargav
18 Aug 2022
Measurement: Sensors | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Latency-driven Model Placement for Efficient Edge Intelligence Service

Abstract

Talk to us

Similar Papers