Abstract
Metric learning is one of the feasible approaches to few-shot learning. However, most metric learning methods encode images through CNN directly, without considering image contents. The general CNN features may lead to hard discrimination among distinct classes. Based on observation that feature maps correspond to image regions, we assume that image regions relevant to target objects should be salient in image features. To this end, we propose an effective framework, called Spatial Attention Network (SAN), to exploit spatial context of images. SAN produces attention weights on clustered regional features indicating the contributions of different regions to classification, and takes weighted sum of regional features as discriminative features. Thus, SAN highlights important contents by giving them large weights. Once trained, SAN compares unlabeled data with class prototypes of few labeled data in nearest-neighbor manner and identifies classes of unlabeled data. We evaluate our approach on three disparate datasets: miniImageNet, Caltech-UCSD Birds and miniDogsNet. Experimental results show that when compared with state-of-the-art models, SAN achieves competitive accuracy in miniImageNet and Caltech-UCSD Birds, and it improves 5-shot accuracy in miniDogsNet by a large margin.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.