Abstract

Few-shot learning suffers from the scarcity of labeled training data. Regarding local descriptors of an image as representations for the image could greatly augment existing labeled training data. Existing local descriptor based few-shot learning methods have taken advantage of this fact but ignore that the semantics exhibited by local descriptors may not be relevant to the image semantic. In this paper, we deal with this issue from a new perspective of imposing semantic consistency of local descriptors of an image. Our proposed method consists of three modules. The first one is a local descriptor extractor module, which can extract a large number of local descriptors in a single forward pass. The second one is a local descriptor compensator module, which compensates the local descriptors with the image-level representation, in order to align the semantics between local descriptors and the image semantic. The third one is a local descriptor based contrastive loss function, which supervises the learning of the whole pipeline, with the aim of making the semantics carried by the local descriptors of an image relevant and consistent with the image semantic. Theoretical analysis demonstrates the generalization ability of our proposed method. Comprehensive experiments conducted on benchmark datasets indicate that our proposed method achieves the semantic consistency of local descriptors and the state-of-the-art performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call