Abstract

Edge-cloud collaborative inference can significantly reduce the delay of a deep neural network (DNN) by dividing the network between mobile edge and cloud. However, the in-layer data size of DNN is usually larger than the original data, so the communication time to send intermediate data to the cloud will also increase end-to-end latency. To cope with these challenges, this paper proposes a novel convolutional neural network structure—BBNet—that accelerates collaborative inference from two levels: (1) through channel-pruning: reducing the number of calculations and parameters of the original network; (2) through compressing the feature map at the split point to further reduce the size of the data transmitted. In addition, This paper implemented the BBNet structure based on NVIDIA Nano and the server. Compared with the original network, BBNet’s FLOPs and parameter achieve up to 5.67× and 11.57× on the compression rate, respectively. In the best case, the feature compression layer can reach a bit-compression rate of 512×. Compared with the better bandwidth conditions, BBNet has a more obvious inference delay when the network conditions are poor. For example, when the upload bandwidth is only 20 kb/s, the end-to-end latency of BBNet is increased by 38.89× compared with the cloud-only approach.

Highlights

  • In recent years, deep learning has achieved awesome performance in various smartapplication scenarios [1,2]

  • We propose a novel convolutional neural network structure in edge-cloud collaborative inference which economizes end-to-end latency through accelerated inference from two directions

  • Based on the related work, we proposed the structure of BBNet shown in Figure 1, which combines three technologies, namely model compression, deep neural network (DNN) model partition, and feature compression

Read more

Summary

Introduction

Deep learning has achieved awesome performance in various smartapplication scenarios [1,2]. Regarding cloud-only, it deploys the DNN model in the cloud, sends the original data directly to the cloud, and returns the inference result. In this way, the original data are transmitted in the channel, which is a threat to sensitive data and increases communication delay. Kang et al [4] proposed a new method of deep neural network partition deployment to achieve the effect of joint inference on the edge device and the cloud. The edge device deploys the early layer of the neural network and uploads the intermediate feature data to the cloud to execute the remaining network layer

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.