Abstract

Recently, deep learning technology has shined in the fields of computer vision, natural language processing and speech recognition, and related products have sprung up like mushrooms. Due to the storage and calculation of deep neural network (DNN) models are relatively large and mobile edge devices are often resource-constrained, how to efficiently deploy DNN models on resource-constrained edge devices has attracted great attention from academia and industry. There’s strength in numbers, so we propose CCIED, a framework which lets edge devices cooperate with each other to complete DNN inference tasks. Due to task inputs in the mobile edge computing scenarios usually have great similarities, the outputs of the middle layer of the neural network and the corresponding labels are cached. When a similar input already exists in the cache, the device does not need to perform the remaining calculations, but directly returns the cached results. One of the challenges of collaborative inference is that the communication overhead associated with transferring intermediate data can be significant. We therefore perform weight pruning only on the layer that obtains the intermediate results, which can greatly reduce the redundant parameters of the intermediate results, thereby reducing the time for transferring data between devices, and basically does not reduce the complexity of the model. Experimental results show that CCIED can efficiently deploy the DNN model on edge devices with almost no loss of precision, and can significantly reduce the total latency during cache hits.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call