Abstract

The purpose of Compositional Zero Shot Learning (CZSL) is to recognize previously unseen compositions of known objects (e.g. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">apple, banana</i> ) and their states (e.g. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ripe, unripe</i> ) as seen in an image. The CZSL is a challenging problem as it is difficult to isolate the visual features of object and its states from its compositions in images. The features of a state may have wide variation across different compositions. For example, the state <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sliced</i> has different visual features in compositions <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sliced apple</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sliced tomato</i> . In this paper, we attempt to solve the problem of CZSL using a two-stage recognition approach. Each stage sequentially performs recognition task utilising two distinct modalities of compositions. The modalities are image features as well as textual features representing features of objects and states respectively. We propose a novel gradient regularized loss term for better disentanglement of object and state features from the visual features of the composition. An appropriate disentanglement of the features of visual primitives (states and objects) leads to correct identification of images of unseen state-object compositions. The proposed approach and the competing methods are evaluated on three benchmark datasets, MIT States, UT-Zappos50 k and CGQA. Our extensive experiments establish the efficacy of our proposed algorithm that outperforms other state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call