Abstract

The emerging edge computing technique provides support for the computation tasks that are delay-sensitive and compute-intensive, such as deep neural network inference, by offloading them from a user-end device to an edge server for fast execution. The increasing offloaded tasks on an edge server are gradually facing the contention of both the network and computation resources. The existing offloading approaches often partition the deep neural network at a place where the amount of data transmission is small to save network resource, but rarely consider the problem caused by computation resource shortage on the edge server. In this paper, we design LoADPart, a deep neural network offloading system. LoADPart can dynamically and jointly analyze both the available network bandwidth and the computation load of the edge server, and make proper decisions of deep neural network partition with a light-weighted algorithm, to minimize the end-to-end inference latency. We implement LoADPart for MindSpore, a deep learning framework supporting edge AI, and compare it with state-of-the-art solutions in the experiments on 6 deep neural networks. The results show that under the variation of server computation load, LoADPart can reduce the end-to-end latency by 14.2% on average and up to 32.3% in some specific cases.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.