Abstract

This paper presents SPARE, a self-supervised part erasing framework for ultra-fine-grained visual categorization. The key insight of our model is to learn discriminative representations by encoding a self-supervised module that performs random part erasing and prediction on the contextual position of the erased parts. This drives the network to exploit intrinsic structure of data, i.e., understanding and recognizing the contextual information of the objects, thus facilitating more discriminative part-level representation. This also enhances the learning capability of the model by introducing more diversified training part segments with semantic meaning. We demonstrate that our approach is able to achieve strong performance on seven publicly available datasets covering ultra-fine-grained visual categorization and fine-grained visual categorization tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call