Abstract

As the primary driver of intelligent mobile applications, deep neural networks (DNNs) have gradually deployed to millions of mobile devices, producing massive latency-sensitive and computation-intensive tasks daily. Mobile edge computing facilitates the deployment of computing resources at the edge, which enables fine-grained offloading of DNN inference tasks from mobile devices to edge nodes. However, most existing studies have not systematically considered three crucial performance aspects: scheduling multiple streams of DNN inference tasks, leveraging multi-exit models to hasten task processing, and partitioning inference models for partial offloading. To this end, this paper proposes an adaptive inference framework in mobile edge computing, which can dynamically select the exit point and partition point for multiple inference task streams. We design a dynamic programming algorithm to obtain an efficient solution under the ideal condition that task arrival information is known. Further, we design a learning-based algorithm for online scheduling, whose training efficiency is improved based on historical experience initialization and priority experience replay. Experimental results show that compared with the Greedy algorithm, the online algorithm improves the performance on two environmental parameters by an average of 5.9% and 32%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call