Abstract

In this paper, we propose an efficient knowledge distillation method to train light networks using heavy networks for semantic segmentation. Most semantic segmentation networks that exhibit good accuracy are based on computationally expensive networks. These networks are not suitable for mobile applications using vision sensors, because computational resources are limited in these environments. In this view, knowledge distillation, which transfers knowledge from heavy networks acting as teachers to light networks as students, is suitable methodology. Although previous knowledge distillation approaches have been proven to improve the performance of student networks, most methods have some limitations. First, they tend to use only the spatial correlation of feature maps and ignore the relational information of their channels. Second, they can transfer false knowledge when the results of the teacher networks are not perfect. To address these two problems, we propose two loss functions: a channel and spatial correlation (CSC) loss function and an adaptive cross entropy (ACE) loss function. The former computes the full relationship of both the channel and spatial information in the feature map, and the latter adaptively exploits one-hot encodings using the ground truth labels and the probability maps predicted by the teacher network. To evaluate our method, we conduct experiments on scene parsing datasets: Cityscapes and Camvid. Our method presents significantly better performance than previous methods.

Highlights

  • Semantic segmentation is a pixel-wise classification problem that determines a predefined class for each pixel in an image

  • We explore knowledge distillation for training compact semantic segmentation networks using heavy networks

  • We present two distillation loss functions: channel and spatial correlation (CSC) distillation loss and adaptive cross-entropy (ACE) distillation loss

Read more

Summary

Introduction

Semantic segmentation is a pixel-wise classification problem that determines a predefined class (or label) for each pixel in an image. This is a fundamental problem in the field of computer vision, and it can be applied to numerous real-world applications of vision sensors, including virtual reality, augmented reality, autonomous vehicles, aerial, and satellite image analysis. Numerous semantic segmentation methods that have exhibited reasonable performances are based on deep neural network algorithms. The deeper and wider the networks, the more accurate and improved the results. Most of these methods focus on accuracy under all scenarios

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call