LoADPart: Load-Aware Dynamic Partition of Deep Neural Networks for Edge Offloading

Hongzhou Liu,Li Li,Wenli Zheng,Minyi Guo

doi:10.1109/icdcs54860.2022.00053

Abstract

The emerging edge computing technique provides support for the computation tasks that are delay-sensitive and compute-intensive, such as deep neural network inference, by offloading them from a user-end device to an edge server for fast execution. The increasing offloaded tasks on an edge server are gradually facing the contention of both the network and computation resources. The existing offloading approaches often partition the deep neural network at a place where the amount of data transmission is small to save network resource, but rarely consider the problem caused by computation resource shortage on the edge server. In this paper, we design LoADPart, a deep neural network offloading system. LoADPart can dynamically and jointly analyze both the available network bandwidth and the computation load of the edge server, and make proper decisions of deep neural network partition with a light-weighted algorithm, to minimize the end-to-end inference latency. We implement LoADPart for MindSpore, a deep learning framework supporting edge AI, and compare it with state-of-the-art solutions in the experiments on 6 deep neural networks. The results show that under the variation of server computation load, LoADPart can reduce the end-to-end latency by 14.2% on average and up to 32.3% in some specific cases.

Full Text