Abstract

Few-shot fine-grained recognition (FS-FGR) aims to distinguish several highly similar objects from different sub-categories with limited supervision. However, traditional few-shot learning solutions typically exploit image-level features and are committed to capturing global silhouettes while accidentally ignore to exploring local details, resulting in an inevitable problem of inconspicuous but distinguishable information loss. Thus, how to effectively address the fine-grained recognition issue given limited samples still remains a major challenging. In this article, we tend to propose an effective bidirectional pyramid architecture to enhance internal representations of features to cater to fine-grained image recognition task in the few-shot learning scenario. Specifically, we deploy a multi-scale feature pyramid and a multi-level attention pyramid on the backbone network, and progressively aggregated features from different granular spaces via both of them. We then further present an attention-guided refinement strategy in collaboration with a multi-level attention pyramid to reduce the uncertainty brought by backgrounds conditioned by limited samples. In addition, the proposed method is trained with the meta-learning framework in an end-to-end fashion without any extra supervision. Extensive experimental results on four challenging and widely-used fine-grained benchmarks show that the proposed method performs favorably against state-of-the-arts, especially in the one-shot scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call