Abstract

Ultra-fine-grained visual categorization (ultra-FGVC) categorizes objects with more similar patterns between classes than those in fine-grained visual categorization (FGVC), e.g., where the spectrum of granularity significantly moves down from classifying species to classifying cultivars within the same species. It is considered as an open research problem mainly due to the following challenges. First, the inter-class differences among images are much smaller by level of orders (e.g., cultivars in the same species) than those in current FGVC tasks (e.g., species). Second, there is only a few samples per category, which is beyond the ability of most large training data favored convolutional neural network methods. To address these problems, we propose a novel random mask covariance network (MaskCOV), which integrates an auxiliary self-supervised learning module with a powerful in-image data augmentation scheme for the ultra-FGVC. Specifically, we first uniformly partition input images into patches and then augment data by randomly shuffling and masking these patches. On top of that, we introduce an auxiliary self-supervised learning module of predicting the spatial covariance context of these patches to increase discriminability of our network for classification. Very encouraging experimental results of the proposed method in comparison with the state-of-the-art benchmarks demonstrate its superiority and potential of MaskCOV concept, which pushes research boundary forward from the fine-grained to the ultra-fine-grained visual categorization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call