This work explores visual recognition models on real-world datasets exhibiting a long-tailed distribution. Most of previous works are based on a holistic perspective that the overall gradient for training model is directly obtained by considering all classes jointly. However, due to the extreme data imbalance in long-tailed datasets, joint consideration of different classes tends to induce the gradient distortion problem; i.e., the overall gradient tends to suffer from shifted direction toward data-rich classes and enlarged variances caused by data-poor classes. The gradient distortion problem impairs the training of our models. To avoid such drawbacks, we propose to disentangle the overall gradient and aim to consider the gradient on data-rich classes and that on data-poor classes separately. We tackle the long-tailed visual recognition problem via a dual-phase-based method. In the first phase, only data-rich classes are concerned to update model parameters, where only separated gradient on data-rich classes is used. In the second phase, the rest data-poor classes are involved to learn a complete classifier for all classes. More importantly, to ensure the smooth transition from phase I to phase II, we propose an exemplar bank and a memory-retentive loss. In general, the exemplar bank reserves a few representative examples from data-rich classes. It is used to maintain the information of data-rich classes when transiting. The memory-retentive loss constrains the change of model parameters from phase I to phase II based on the exemplar bank and data-poor classes. The extensive experimental results on four commonly used long-tailed benchmarks, including CIFAR100-LT, Places-LT, ImageNet-LT, and iNaturalist 2018, highlight the excellent performance of our proposed method.
Read full abstract