Attribute-Induced Bias Eliminating for Transductive Zero-Shot Learning

Hantao Yao,Yongdong Zhang,Changsheng Xu,Shaobo Min

doi:10.1109/tmm.2021.3074252

Abstract

Transductive zero-shot learning is designed to recognize unseen categories by aligning both visual and semantic information in a joint embedding space. Four types of domain biases exist in Transductive ZSL, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">visual bias and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">semantic bias in two domains, and two <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">visual-semantic biases exist in the seen and unseen domains. However, the existing work has only focused on specific components of these topics, leading to severe semantic ambiguity during knowledge transfer. To solve this problem, we propose a novel attribute-induced bias eliminating (AIBE) module for Transductive ZSL. Specifically, for the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">visual bias between the two domains, the mean-teacher module is first used to bridge the visual representation discrepancy between the two domains using unsupervised learning and unlabeled images. Then, an attentional graph attribute embedding process is proposed to reduce the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">semantic bias between seen and unseen categories using a graph operation to describe the semantic relationship between categories. To reduce semantic-visual bias in the seen domain, we align the visual center of each category with the corresponding semantic attributes instead of with the individual visual data point, which preserves the semantic relationship in the embedding space. Finally, for the semantic-visual bias in the unseen domain, an unseen semantic alignment constraint is designed to align visual and semantic space using an unsupervised process. The evaluations on several benchmarks demonstrate the effectiveness of the proposed method, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g., 82.8%/75.5%, 97.1%/82.5%, and 73.2%/52.1% for Conventional/Generalized ZSL settings for CUB, AwA2, and SUN datasets, respectively.

Full Text