Multilevel Neural Network for Reducing Expected Inference Time

Tryan Aditya Putra,Jenq-Shiou Leu

doi:10.1109/access.2019.2952577

Tryan Aditya Putra, Jenq-Shiou Leu

Open Access

https://doi.org/10.1109/access.2019.2952577

Copy DOI

Abstract

It is widely known that deep neural networks (DNNs) can perform well in many applications, and can sometimes exceed human ability. However, their cost limits their impact in a variety of real-world applications, such as IoT and mobile computing. Recently, many DNN compression and acceleration methods have been employed to overcome this problem. Most methods succeed in reducing the number of parameters and FLOPs, but only a few can speed up expected inference times because of either the overhead generated from using such methods or DNN framework deficiencies. Edge-cloud computing has recently emerged and presents an opportunity for new model acceleration and compression techniques. To address the aforementioned problem, we propose a novel technique to speed up expected inference times by using several networks that perform the exact same task with different strengths. Although our method is based on edge-cloud computing, it is suitable for any other hierarchical computing paradigm. Using a simple yet strong enough estimator, the system predicts whether the data should be passed to a larger network or not. Extensive experimental results demonstrate that the proposed technique can speed up expected inference times and beat almost all state-of-the-art compression techniques, including pruning, low-rank approximation, knowledge distillation, and branchy-type networks, on both CPUs and GPUs.

Highlights

Deep neural networks (DNNs) have been widely for various tasks: classification [1], [2], segmentation [3]–[5], recognition [6], [7], caption-generation, [8]–[10] and translation [11], [12]
The multilevel NN composed of ResNet 32 + 8 could not achieve a higher accuracy than 10% soft filter pruning (SFP) pruning on a ResNet 32
The decider decides whether the input data should be passed to the level of the network, which is smarter than the current level of the network

Summary

INTRODUCTION

Deep neural networks (DNNs) have been widely for various tasks: classification [1], [2], segmentation [3]–[5], recognition [6], [7], caption-generation, [8]–[10] and translation [11], [12]. IoT devices, which usually have limited computation power, act as clients and send sensor data to the DNN model stored in a cloud computer. Such server-client scenarios require a high-bandwidth upstream. The solutions proposed in previous works are combined, by which the cloud server stores a deep and powerful DNN model whereas the edge device keeps a mini-version of the DNN model in the cloud. Considering that the NN model between the cloud server and the edge device is not a partition model, the original sensor data is sent to the cloud in cases where the confidence level of the mini-version prediction is not high enough. We compare our results with many state-of-the-art methods for model compression, including pruning, low-rank approximation, knowledge distillation, and branchy-type networks

PREVIOUS WORK

METHODOLOGY

ENTROPY

BOOSTING

EXPERIMENTS

CIFAR DATASET

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multilevel Neural Network for Reducing Expected Inference Time

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Survey of Related Research on Compression and Acceleration of Deep Neural Networks
Xin Long ... Zongcheng Ben
Journal of Physics: Conference Series | VOL. 1213
Xin Long, et. al.Xin Long ... Zongcheng Ben
01 Jun 2019
Journal of Physics: Conference Series | VOL. 1213

High-performance and energy-efficient deep learning for resource-constrained devices
Ao Ren
-
Ao RenAo Ren
10 May 2021
10 May 2021

Transforming Large-Size to Lightweight Deep Neural Networks for IoT Applications
Rahul Mishra ... Hari Gupta
ACM Computing Surveys | VOL. 55
Rahul Mishra, et. al.Rahul Mishra ... Hari Gupta
09 Feb 2023
ACM Computing Surveys | VOL. 55

Knowledge Distillation: A Survey
Jianping Gou ... Baosheng Yu
International Journal of Computer Vision | VOL. 129
Jianping Gou, et. al.Jianping Gou ... Baosheng Yu
22 Mar 2021
International Journal of Computer Vision | VOL. 129

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multilevel Neural Network for Reducing Expected Inference Time

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access