Abstract

With the popularity of mobile devices, intelligent applications, e.g., face recognition, intelligent voice assistant, and gesture recognition, have been widely used in our daily lives. However, due to the lack of computing capacities, it is difficult for mobile devices to support complex Deep Neural Network (DNN) inference. To alleviate the pressure on these devices, traditional methods usually upload part of the DNN model to a cloud server and perform a DNN query after uploading an entire DNN model. To achieve real-time DNN query, we consider the collaboration between local, edge and cloud, and perform DNN query when uploading DNN partitions. In this paper, we propose an Efficient offloading scheme for DNN Inference Acceleration (EosDNN) in a local-edge-cloud collaborative environment, where the DNN inference acceleration is mainly embodied in the optimization of migration delay and realization of real-time DNN query. EosDNN comprehensively considers the migration plan and uploading plan, where for the former, a Particle Swarm Optimization with Genetic Algorithm (PSO-GA) is applied to obtain the distribution of DNN layers under the server with the lowest migration delay, and for the latter, a Layer Merge Uploading Algorithm (LMU) is proposed to obtain DNN partitions and their upload order with efficient DNN query performance. Experimental results demonstrate that EosDNN can be applied to large-scale DNN model migration, which can achieve an ideal migration delay and obtain a more fine-grained DNN partition uploading plan, thereby optimizing DNN query performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call