Abstract

Deep learning services are extensively required and have powerful expected effects in a wide range of applications, such as auto-self driving, voice assistant and so on.Traditionally, deep learning services are mainly provided based on cloud computing, referred to as cloud intelligence services, where the deep learning model is deployed in the cloud, and endusers need to upload data through wireless and core network when requesting training and inference services. However, the cloud computing-based deep learning services have deficiencies in latency, privacy, etc. For example, the deep learning service users do not want their private data to leak, and the privacy problem is difficult to solve when the data is uploading to the cloud. With the widespread use of the Internet of Things (IoT) and the rapid development of mobile devices such as smartphones and IoT sensors, large amounts of data need to be used for a variety of real-time deep learning services, such as target recognition and voice recognition for smart cities, smart medical care, and the Internet of Vehicles (IoVs). With a certain network bandwidth, a large amount of data uploaded to the cloud will cause network congestion and greatly increase the response time. To meet the requirements of low latency, researchers have begun to consider the deployment of deep learning services in edges, i.e., edge intelligence service.In edge intelligence services, the computation capability and memory of processors (or devices) are different from a large. At the same time, the requirement of memory size of deep neural network (DNN) models is increasing, such as the memory usage for Alexnet and Resnet are 2.12G and 16.20G separately. Also, in DNN model design, the branches are becoming common, which brings the parallelism. Deploying deep learning models on multiple processors can support the large-scale DNN models and the parallel implementation of DNN model, where the computation of a deep learning model can be conducted in parallel is a possible solution to improve the efficiency of edge intelligence services. The key point in edge intelligence services is how to partition and assign the implementation of the DNN model.In this paper, we propose a novel latency-driven deep learning model placement method for efficient edge intelligence service. Model placement contains two procedures: model partition and sub-models assignment. In our method, we first convert a DNN model into an execution graph, which is a directed acyclic graph (DAG), and propose a novel latency-driven multilevel graph partition for the model. Then the partitioned submodels are heuristically assigned to available processors. To the best of our knowledge, it is the first work that proposes latency-driven graph partition algorithms for model placement. Extensive experiments on several commonly used DNN models and synthetic datasets show that our method can achieve the lowest execution latency with low complexity compared with other state-of-the-art model placement methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.