Abstract

As a long-standing research area, class incremental learning (CIL) aims to effectively learn a unified classifier along with the growth of the number of classes. Due to the small inter-class variances and large intra-class variances, fine-grained visual categorization (FGVC) as a challenging visual task has not attracted enough attention in CIL. Therefore, the localization of critical regions specialized for fine-grained object recognition plays a crucial role in FGVC. Additionally, it is important to learn fine-grained features from critical regions in fine-grained CIL for the recognition of new object classes. This paper designs a network architecture named two-branch attention learning network (TBAL-Net) for fine-grained CIL. TBAL-Net can localize critical regions and learn fine-grained feature representation by a lightweight attention module. An effective training framework is proposed for fine-grained CIL by integrating TBAL-Net into an effective CIL process. This framework is tested on three popular fine-grained object datasets, including CUB-200-2011, FGVC-Aircraft, and Stanford-Car. The comparative experimental results demonstrate that the proposed framework can achieve the state-of-the-art performance on the three fine-grained object datasets.

Highlights

  • Zhifeng Xiao and Jianjun YangIn the real world, a visual system may involve constantly emerging new objects

  • It is observed that multi-branch and multi-scale attention learning (MMAL) presents better performance in the initial two phases than other methods, showing its ability to capture more distinguishable patterns in the beginning of the class incremental learning (CIL) training when the number of classes is relatively low

  • TBAL-Net with the CNN prediction is better than TBAL-Net with the NME prediction, showing that the former demonstrates a superior ability in extracting more discriminative fine-grained features

Read more

Summary

Introduction

Zhifeng Xiao and Jianjun YangIn the real world, a visual system may involve constantly emerging new objects. The visual system should be able to keep the recognition performance on existing objects when it keeps learning to recognize new objects [1]. As a straightforward approach of computer vision, pretrained models, such as VGG [2], Inception [3,4] or ResNet [5], are finetuned on a new training dataset for the recognition of new objects. This may lead to a common issue—catastrophic forgetting. One pretrained model finetuned on a new dataset result in considerable performance drop on previous datasets. Existing CIL methods [9,10,11,12,13]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call