Abstract

Zero-shot learning (ZSL) aims to recognize unseen classes during training. Transductive methods have advanced in ZSL, however, often rely on pseudo labels based on confidence scores, leading to semantic misalignment between unseen-class image features and corresponding class semantic descriptions due to noisy pseudo labels. In this paper, we introduce a novel Consistency-Guided Pseudo-Labeling (CGPL) to generate high-quality pseudo labels, achieving robust mapping from visual to semantic space for unseen classes. CGPL incorporates a large-scale vision-language model as a collaborator with the ZSL model to generate high-quality pseudo-labels. Then, pseudo-labeled samples with consistent prediction of two models are added to the training set, to learn the visual-to-semantic mapping for unseen classes. Furthermore, we design a quasi-classification loss based on reconstructed unseen prototypes to learn accurate visual-semantic mapping. Consequently, CGPL is further encouraged to obtain higher-quality pseudo labels, and progressively learn the precise visual-semantic mapping for unseen classes throughout the iterative process. Our extensive experimental results across four benchmark datasets highlight the superior performance of CGPL in both CZSL and GZSL settings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call