Abstract

In recent years, deep learning models have been used successfully in almost every field including both industry and academia, especially for computer vision tasks. However, these models are huge in size, with millions (and billions) of parameters, and thus cannot be deployed on the systems and devices with limited resources (e.g., embedded systems and mobile phones). To tackle this, several techniques on model compression and acceleration have been proposed. As a representative type of them, knowledge distillation suggests a way to effectively learn a small student model from large teacher model(s). It has attracted increasing attention since it showed its promising performance. In the work, we propose an ensemble model that combines feature-based, response-based, and relation-based lightweight knowledge distillation models for simple image classification tasks. In our knowledge distillation framework, we use ResNet−20 as a student network and ResNet−110 as a teacher network. Experimental results demonstrate that our proposed ensemble model outperforms other knowledge distillation models as well as the large teacher model for image classification tasks, with less computational power than the teacher model.

Highlights

  • During the last few years, deep learning models have been used successfully in numerous industrial and academic fields, including computer vision [1,2], reinforcement learning [3], and natural language processing [4]

  • We propose an ensemble model that combines three lightweight models learned by three different knowledge distillation strategies on CIFAR-10 and CIFAR-100 datasets which are widely used as the benchmarks for the image classification task

  • We provided an extensive evaluation of 20 different knowledge distillation and our proposed ensemble methods

Read more

Summary

Introduction

During the last few years, deep learning models have been used successfully in numerous industrial and academic fields, including computer vision [1,2], reinforcement learning [3], and natural language processing [4]. Most deep learning models are computationally expensive to run on resource-limited devices such as mobile phones and embedded devices. To overcome these limitations, several model compression techniques (e.g., low-rank factorization [5,6,7], parameter sharing and pruning [8,9,10,11,12,13,14,15,16,17,18,19], and transferred/compact convolutional filters [20,21,22,23,24,25]) have been proposed for reducing the size of the models while still providing similar performance. It provides greater architectural flexibility since the structural differences between the teacher and student are allowed

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call