Abstract

Observing the phenomenon that the discriminative visual features and unambiguous attribute descriptions are important in zero-shot learning (ZSL), we propose a Multi-scale Visual Attention for Attribute Disambiguation (MVAAD). MVAAD contains a Multi-Scale Visual Attention Network (MSVAN) to realize attentions on image regions, which helps MVAAD to learn more discriminative visual features. Based on the multi-scale visual features in MSVAN, we also develop a Coarse-to-fine Visual-guided Attribute Selection Module (CVASM) to use the multi-scale visual attentive features for attribute disambiguation. Both of MSVAN and CVASM can be jointly trained in an end-to-end manner by minimizing the visual-semantic classification loss and the latent-visual contrastive triplet loss. Experimental results on four popular ZSL benchmarks, AwA2, CUB, SUN and FLO, illustrate that MVAAD is able to not only achieve the state-of-the-art performance, but also give meaningful and explainable visualizations on the visual attention and the attribute selection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call