Abstract

Zero-shot learning (ZSL) aims to recognize unseen images from invisible classes, by transferring semantic knowledge from visible classes to invisible classes. Such as, although humans have never seen a zebra, if we know that “a horse with stripes is a zebra”, then we can easily recognize it when we see a zebra. Given semantic descriptions, the human can capture intrinsic visual clues from different channels or appearance factors (e.g., color, texture) on salient parts. But computers are not smart enough to recognize it with high accuracy, they still need to make progress in the learning of semantic-aligned visual representations. Therefore, we propose a semantic-aligned reinforced attention (SRA) model to improve the attributes localization ability. We aim to discover invariable features related to class-level semantic attributes from variable intra-class vision information, and thereby avoid misalignment between much visual information and simple semantic representations. Specially, during the localization of spatial attention, we develop an efficient constraint directly on feature map to ensure the intra-attention compactness and inter-attention dispersion characteristics like human gaze. While for the channel, we proposed a novel attributes attention cross entropy loss to exploit the supervision effect of each semantic attribute subset. Experiments on three ZSL benchmarks, i.e., CUB, SUN and AWA2, indicate the competitiveness of our proposed method against the state-of-the-art ZSL methods. • Propose a ZSL model to mimic human cognitive process by exploiting the reinforced attention from both space and channel. • Develop an efficient loss for constraint on feature map for better interpretation. • Achieve superior or competitive performance on widely used standard benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call