Abstract

The success of deep learning greatly attributes to massive data with accurate labels. However, for few shot learning, especially zero shot learning, deep models cannot be well trained in that there are few available labeled samples. Inspired by human visual system, attention models have been widely used in action recognition, instance segmentation, and other vision tasks by introducing spatial, temporal, or channel-wise weights. In this paper, we propose a self-attention relation network (SARN) for few-shot learning. SARN consists of three modules, i.e., embedding module, attention module and relation module. The embedding module extracts feature maps while the attention module is introduced to enhance the learned features. Finally the extracted features of the query sample and support set are fed into the relation module for comparison, and the relation score is output for classification. Compared with the existing relation network for few shot learning, SARN can discover non-local information and allow long-range dependency. SARN can be easily extended to zero shot learning by replacing the support set with semantic vectors. Experiments on benchmarks (Omniglot, miniImageNet, AwA, and CUB) show that our proposed SARN outperforms the state-of-the-art algorithms in terms of both few shot and zero shot tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call