Abstract

With the increase in model parameters, deep neural networks (DNNs) have achieved remarkable performance in computer vision, but larger DNNs create a bottleneck for deploying DNNs on resource-constrained edge devices. The cloud–edge collaborative inference based on network pruning provides a solution for the deployment of DNNs on edge devices. However, the pruning methods adopted by existing frameworks are locally effective, and the compressed models are over-sparse. In this paper, we design a cloud–edge collaborative inference framework based on network pruning to make full use of the limited computing resources on edge devices. In our framework, we propose a sparsity-aware feature bias minimization pruning method to reduce the feature bias that happens during network pruning and prevent the pruned model from being over-sparse. To further reduce the inference latency, we consider the difference in computing resources between edge devices and the cloud, then design a task-oriented asymmetric feature coding to reduce the communication overhead of transmitting intermediate data. With comprehensive experiments, our framework can reduce end-to-end latency by 82% to 84% with less than 1% accuracy loss, compared to the cloud–edge collaborative inference framework with traditional methods, and our framework has the lowest end-to-end latency and accuracy loss compared to other frameworks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call