Multistructure-Based Collaborative Online Distillation.

Liang Gao,Dawei Feng,Haibo Mi,Xu Lan,Yuxing Peng,Kele Xu

doi:10.3390/e21040357

Abstract

Recently, deep learning has achieved state-of-the-art performance in more aspects than traditional shallow architecture-based machine-learning methods. However, in order to achieve higher accuracy, it is usually necessary to extend the network depth or ensemble the results of different neural networks. Increasing network depth or ensembling different networks increases the demand for memory resources and computing resources. This leads to difficulties in deploying depth-learning models in resource-constrained scenarios such as drones, mobile phones, and autonomous driving. Improving network performance without expanding the network scale has become a hot topic for research. In this paper, we propose a cross-architecture online-distillation approach to solve this problem by transmitting supplementary information on different networks. We use the ensemble method to aggregate networks of different structures, thus forming better teachers than traditional distillation methods. In addition, discontinuous distillation with progressively enhanced constraints is used to replace fixed distillation in order to reduce loss of information diversity in the distillation process. Our training method improves the distillation effect and achieves strong network-performance improvement. We used some popular models to validate the results. On the CIFAR100 dataset, AlexNet’s accuracy was improved by 5.94%, VGG by 2.88%, ResNet by 5.07%, and DenseNet by 1.28%. Extensive experiments were conducted to demonstrate the effectiveness of the proposed method. On the CIFAR10, CIFAR100, and ImageNet datasets, we observed significant improvements over traditional knowledge distillation.

Highlights

The development of deep learning [1,2] has led to a leap in the fields of computer vision [3,4,5,6]and natural language processing [7,8,9,10,11]
Top-1 accuracy improved by 5.94% (49.79–43.85) for AlexNet when training together with VGG
This suggests a generic superiority of our online knowledge distillation across different architectures

Summary

Introduction

The development of deep learning [1,2] has led to a leap in the fields of computer vision [3,4,5,6]and natural language processing [7,8,9,10,11]. In image recognition in particular [4,12,13], recognition accuracy has reached a high level by using deep-learning methods. The huge demand for resources is an important obstacle to the promotion and use of deep-learning models in the industry. Training a deeper network or merging multiple models [16,17] may achieve better performance, but this cannot avoid the growth of resource consumption. The problem of how to improve performance without increasing network size has received extensive attention. Some training methods, such as model compression [18,19] and model pruning [20], have been proposed to solve this problem

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy (Basel, Switzerland)	Publication Date: Apr 2, 2019
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multistructure-Based Collaborative Online Distillation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)

Lead the way for us

Similar Papers

Web-aided data set expansion in deep learning: evaluating trainable activation functions in ResNet for improved image classification
Zhiqiang Zhang ... Zhiyong Shi
International Journal of Web Information Systems | VOL. 20
Zhiqiang Zhang, et. al.Zhiqiang Zhang ... Zhiyong Shi
12 Jul 2024
International Journal of Web Information Systems | VOL. 20

Security in defect detection: A new one-pixel attack for fooling DNNs
Pengchuan Wang ... Amrit Mukherjee
Journal of King Saud University - Computer and Information Sciences | VOL. 35
Pengchuan Wang, et. al.Pengchuan Wang ... Amrit Mukherjee
15 Aug 2023
Journal of King Saud University - Computer and Information Sciences | VOL. 35

Deep pyramidal residual networks with inception sub-structure in image classification
Fei Xu ... Peng Wang
Journal of Intelligent & Fuzzy Systems | VOL. 45
Fei Xu, et. al.Fei Xu ... Peng Wang
04 Oct 2023
Journal of Intelligent & Fuzzy Systems | VOL. 45

Uncertainty handling in convolutional neural networks.
Elyas Rashno ... Babak Nasersharif
Neural Computing and Applications | VOL. 34
Elyas Rashno, et. al.Elyas Rashno ... Babak Nasersharif
18 Jun 2022
Neural Computing and Applications | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multistructure-Based Collaborative Online Distillation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)