Abstract

The quality of service (QoS) of intelligent applications on mobile devices heavily depends on the inference speed of Deep Neural Network (DNN) models. Cooperative DNN inference has become an efficient way to reduce inference latency. In cooperative inference, a mobile device offloads a part of its inference task to cloud servers. The large communication volume usually is the bottleneck of such systems. Priory research focuses on reducing the communication volume by finding optimal partition points. We notice that the computation and communication resources on mobile devices can work in pipeline, which can hide the communication time behind computation and further reduce the inference latency. Based on the observation, we formulate the offloading pipeline scheduling problem. We aim to find the optimal sequence of DNN execution and offloading for mobile devices such that the inference latency is minimized. If we use a directed acyclic graph (DAG) to model a DNN, the complex precedence constraints in DAGs bring challenges to our problem. Notice that most DNN models have independent paths or tree structures, we present an optimal path-wise DAG scheduler and an optimal layer-wise scheduler for tree-structure DAGs. Then, we proposed a heuristic based on topological sort to schedule general-structure DAGs. The prototype of our offloading scheme is implemented on a real-world testbed, where we use Raspberry Pi as the mobile device and lab PCs as the cloud. Various DNN models are tested and our scheme can reduce their inference latencies in different network environments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.