Siamese self-supervised learning for fine-grained visual classification

Ruyi Ji,Jiaying Li,Libo Zhang

doi:10.1016/j.cviu.2023.103658

Abstract

Fine-grained visual classification (FGVC) is challenging to capture subtle yet distinct visual cues due to large intra-class and small inter-class variances. To this end, we propose a new Siamese Self-supervised Learning method to perform alignment between different views of one image. Specifically, we employ the attention mechanism to explore the semantic parts of one image, and then generate different views by crop and erase strategy. Meanwhile, we adopt the Siamese network to perform the feature alignment across various views and capture the view-invariant feature in a self-supervised way. Finally, we introduce the center loss to explicitly ensure consistency between different views. Extensive experimental results show the proposed method performs on par with the state-of-the-art methods on three public benchmarks including CUB-200-2011, FGVC-Aircraft, and Stanford Cars.

Full Text