Abstract

This paper studies inference acceleration using distributed CNNs in collaborative edge computing network. To ensure no inference accuracy loss in task partitioning, we propose receptive field-based segmentation. To reduce the computation time and communication overhead, we propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks. To find the optimal partition of a CNN model, we use dynamic programming, named as DPFP. To address computation heterogeneity of edge servers (ESs), we design a low-complexity search algorithm which can select the optimal subset of collaborative ESs for inference. The experimental results show that DPFP can accelerate inference up to 71% for ResNet-50 and 73% for VGG-16 compared to running the pre-trained models, which outperforms the existing works MoDNN and DeepSlicing. Moreover, we propose an analytical method to estimate the speedup ratio of different GPU platforms by using FLOPs and effective computing capacity. Furthermore, we evaluate the service failure probability under time-variant channel and variation of image sizes, which shows that DPFP is effective to ensure high service reliability with strict service deadline.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call