Abstract

Compositional Zero-Shot learning (CZSL) requires recognizing unseen attribute-object compositions using observed visual primitives attributes and objects in a training set, which is a critical capacity for learning systems because the long tail of new combinations dominates the distribution in the real world. However, CZSL is a challenging problem because learning systems tend to learn the dependencies between objects and attributes, which is not conducive to composition classification, and incorrect dependencies will mislead the classification of new combinations of known attributes and objects. This paper primarily introduces a novel yet effective dual-stream contrastive learning method with two main objectives: making the learned representations discriminative and transferring knowledge more efficiently from seen to unseen compositions. Specifically, we generate positive and negative pairs based on the similarity of different concepts (attributes and objects), independently capturing the discriminative representations of concepts. Meanwhile, unlike existing contrastive methods that select negative samples randomly, we construct confusable compositional representations as the negatives to explore the intrinsic relevance between attributes and objects, which can improve the generalization from seen to unseen compositions. Experimental results on two benchmarks show that the proposed method outperforms the state-of-the-arts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call