Abstract
In long-tailed recognition tasks, the knowledge distillation technology is widely adopted for improving performance of deep neural networks. These methods distill the knowledge from the pretrained teacher model to the student model, which enables higher long-tailed recognition accuracy. However, the dependence on accompanying assistive models complicates the single network’s training process in the need for large memory and time costs. In this work, we present Balanced Self-Distillation (BSD) to distill tail knowledge by a single network without the assistive models. Specifically, BSD distills knowledge between different distortions of the same samples to stimulate the representation learning potential of the single network and adopts a balanced class weight for shifting the distillation focus from head-to-tail classes. Comprehensive experimentation across diverse datasets, including CIFAR-10-LT, CIFAR-100-LT and TinyImageNet-LT, consistently outperforms robust baseline methods. Specifically, BSD achieves improvements of 8.13% on CIFAR-100-LT with an imbalance ratio of 100 compared to the baseline (cross entropy). Furthermore, the proposed method enables seamless integration with contemporary techniques like re-sampling, meta-learning, and cost-sensitive learning. It emerges as a versatile tool capable of effectively addressing the challenges of long-tailed scenarios.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.