Abstract

Fine-grained visual classification (FGVC) is challenging to capture subtle yet distinct visual cues due to large intra-class and small inter-class variances. To this end, we propose a new Siamese Self-supervised Learning method to perform alignment between different views of one image. Specifically, we employ the attention mechanism to explore the semantic parts of one image, and then generate different views by crop and erase strategy. Meanwhile, we adopt the Siamese network to perform the feature alignment across various views and capture the view-invariant feature in a self-supervised way. Finally, we introduce the center loss to explicitly ensure consistency between different views. Extensive experimental results show the proposed method performs on par with the state-of-the-art methods on three public benchmarks including CUB-200-2011, FGVC-Aircraft, and Stanford Cars.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call