Granularity-aware distillation and structure modeling region proposal network for fine-grained image classification

Xiao Ke,Yuhang Cai,Baitao Chen,Hao Liu,Wenzhong Guo

doi:10.1016/j.patcog.2023.109305

Abstract

Fine-grained visual classification (FGVC) aims to identify objects belonging to multiple sub-categories of the same super-category. The key to solving fine-grained classification problems is to learn discriminative visual feature representation with only subtle differences. Although previous work based on refined feature learning has made great progress, however, high-level semantic features often lack key information for fine-grained visual object nuances. How to efficiently integrate semantic information of different granularities from classification networks is a critical. In this paper, we propose Granularity-aware Distillation and Structure Modeling region Proposal Network(GDSMP-Net). Our solution integrates multi-granularity hierarchical information through a multi-granularity fusion learning strategy to enhance feature representation. In view of the inherent challenges of large intra-class differences in FGVC, a cross-layer self-distillation regularization is proposed to to strengthen the connection between high-level semantics and low-level semantics for robust multi-granularity feature learning. On this basis, we use a weakly supervised method to generate local branches, and the collaborative learning of discriminative semantics and structural semantics based on local regions, facilitating model to perceive contextual information to capture structural interactions between local semantics. Comprehensive experiments show that our method achieves state-of-the-art performance on four widely-used challenging datasets.(CUB-200-2011, Stanford Cars, FGVC-Aircraft and NA-birds).

Full Text