Abstract

It is widely known that deep neural networks (DNNs) can perform well in many applications, and can sometimes exceed human ability. However, their cost limits their impact in a variety of real-world applications, such as IoT and mobile computing. Recently, many DNN compression and acceleration methods have been employed to overcome this problem. Most methods succeed in reducing the number of parameters and FLOPs, but only a few can speed up expected inference times because of either the overhead generated from using such methods or DNN framework deficiencies. Edge-cloud computing has recently emerged and presents an opportunity for new model acceleration and compression techniques. To address the aforementioned problem, we propose a novel technique to speed up expected inference times by using several networks that perform the exact same task with different strengths. Although our method is based on edge-cloud computing, it is suitable for any other hierarchical computing paradigm. Using a simple yet strong enough estimator, the system predicts whether the data should be passed to a larger network or not. Extensive experimental results demonstrate that the proposed technique can speed up expected inference times and beat almost all state-of-the-art compression techniques, including pruning, low-rank approximation, knowledge distillation, and branchy-type networks, on both CPUs and GPUs.

Highlights

  • Deep neural networks (DNNs) have been widely for various tasks: classification [1], [2], segmentation [3]–[5], recognition [6], [7], caption-generation, [8]–[10] and translation [11], [12]

  • The multilevel NN composed of ResNet 32 + 8 could not achieve a higher accuracy than 10% soft filter pruning (SFP) pruning on a ResNet 32

  • The decider decides whether the input data should be passed to the level of the network, which is smarter than the current level of the network

Read more

Summary

INTRODUCTION

Deep neural networks (DNNs) have been widely for various tasks: classification [1], [2], segmentation [3]–[5], recognition [6], [7], caption-generation, [8]–[10] and translation [11], [12]. IoT devices, which usually have limited computation power, act as clients and send sensor data to the DNN model stored in a cloud computer. Such server-client scenarios require a high-bandwidth upstream. The solutions proposed in previous works are combined, by which the cloud server stores a deep and powerful DNN model whereas the edge device keeps a mini-version of the DNN model in the cloud. Considering that the NN model between the cloud server and the edge device is not a partition model, the original sensor data is sent to the cloud in cases where the confidence level of the mini-version prediction is not high enough. We compare our results with many state-of-the-art methods for model compression, including pruning, low-rank approximation, knowledge distillation, and branchy-type networks

PREVIOUS WORK
METHODOLOGY
ENTROPY
BOOSTING
EXPERIMENTS
CIFAR DATASET
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.