The major disadvantage of supervised methods for action recognition is the need for a large amount of annotated data, where the data is matched to its label accurately. To address this issue, Zero-Shot Learning (ZSL) is introduced. Zero short learning primarily uses data that is synthesized to compensate for lack of training examples. In this paper, two different approaches are proposed for the synthesis of artificial examples for novel classes; namely, inverse autoregressive flow (IAF) based generative model and bi-directional adversarial GAN(Bi-dir GAN). A consequence of the proposed approach is a transductive setting using a semi-supervised variational autoencoder, where the unlabelled data from unseen classes are used to train the model. This enables the generation of novel class examples from textual descriptions. The proposed models perform well in the following settings, namely, i) Standard setting(ZSL), where the test data comes only from unseen classes, and ii) Generalized setting(GZSL), where the test data comes from both seen and unseen classes. In the case of the generalized setting, examples with pseudo labels are generated for unseen classes. Experiments are performed on three baseline datasets, UCF101, HMDB51, and Olympic. In comparison with state-of-the-art approaches, both the proposed models, IAF based generative model and Bi-dir GAN model outperform in UCF101, and Olympic datasets in all the settings and achieve comparative results in HMDB51.
Read full abstract