Abstract

In long-tailed recognition tasks, the knowledge distillation technology is widely adopted for improving performance of deep neural networks. These methods distill the knowledge from the pretrained teacher model to the student model, which enables higher long-tailed recognition accuracy. However, the dependence on accompanying assistive models complicates the single network’s training process in the need for large memory and time costs. In this work, we present Balanced Self-Distillation (BSD) to distill tail knowledge by a single network without the assistive models. Specifically, BSD distills knowledge between different distortions of the same samples to stimulate the representation learning potential of the single network and adopts a balanced class weight for shifting the distillation focus from head-to-tail classes. Comprehensive experimentation across diverse datasets, including CIFAR-10-LT, CIFAR-100-LT and TinyImageNet-LT, consistently outperforms robust baseline methods. Specifically, BSD achieves improvements of 8.13% on CIFAR-100-LT with an imbalance ratio of 100 compared to the baseline (cross entropy). Furthermore, the proposed method enables seamless integration with contemporary techniques like re-sampling, meta-learning, and cost-sensitive learning. It emerges as a versatile tool capable of effectively addressing the challenges of long-tailed scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call