Visual-Semantic Aligned Bidirectional Network for Zero-Shot Learning

Rui Gao,Yuming Shen,Li Liu,Xingsong Hou,Yang Long,Ling Shao,Zhao Zhang,Jie Qin

doi:10.1109/tmm.2022.3145666

Abstract

Zero-shot learning (ZSL) aims to recognize unknown categories that are unavailable during training. Recently, generative models have shown the potential to address this challenging problem by synthesizing unseen features conditioned on semantic embeddings such as attributes. However, unidirectional generative models cannot guarantee the effective coupling between visual and semantic spaces. To this end, we propose a visual-semantic aligned bidirectional network with cycle consistency to alleviate the gap between these two spaces, generating unseen features of high quality. More importantly, we incorporate two carefully designed strategies into our bidirectional framework to improve the overall ZSL performance. Specifically, we enhance the intra-domain class divergence in both visual and semantic spaces, and in the meantime, mitigate the inter-domain shift to preserve seen-unseen domain discrimination. Experimental results on four standard benchmarks show the superiority of our framework over existing state-of-the-art methods under both conventional and generalized ZSL settings.

Full Text