Abstract

Webly supervised fine-grained image recognition (FGIR) learns to distinguish sub-ordinate categories based on webly-retrieved data, which can dramatically alleviate the dependency on manually annotated labels. This is quite a challenging task due to the heavy noise labels and the inherent dilemma of small inter-class variance and large intra-class variance. Current webly supervised algorithms learn holistic category prototypes to help correct noisy labels but ignore local features that can distinguish different sub-ordinate categories. In this work, we propose a Graph Representation and Prototype Learning (GRPL) framework to automatically mine discriminative local regions and their interactions with holistic image to learn instance graph representation both category graph prototype to help correct noisy labels and retrieve out-of-distribution (OOD) samples. Specifically, an attention-focused module is designed to extract the discriminative regions and then build a structured graph to correlate them with the holistic image for each instance and an identical graph to model holistic-local correlations for each category. Next, we apply two stacked graph convolution networks to explore holistic-local interaction within each graph and across two graphs to learn graph representation for each instance and graph prototype for each category. Finally, the similarities between the instance-level and prototype-level graph representation are learned to help correct noisy labels and exclude OOD samples. Extensive experiments conducted on several datasets show the proposed approach achieves superior performance compared with current leading algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call