Abstract
Zero-shot learning (ZSL) is to classify images according to detailed attribute annotations into new categories that are unseen during the training stage. Generalized zero-shot learning (GZSL) adds seen categories to the test samples. Since the learned classifier has inherent bias against seen categories, GZSL is more challenging than traditional ZSL. However, at present, there is no detailed attribute description dataset for video classification. Therefore, the current zero-shot video classification problem is based on the synthesis of generative adversarial networks trained on seen-class features into unseen-class features for ZSL classification. In order to solve this problem, we propose a description text dataset based on the UCF101 action recognition dataset. To the best of our knowledge, this is the first work to add description of the classes to zero-shot video classification. We propose a new loss function that combines visual features with textual features. We extract text features from the proposed text data set, and constrain the process of generating synthetic features based on the principle that videos with similar text types should be similar. Our method reapplies the traditional zero-shot learning idea to video classification. From the experimental point of view, our proposed dataset and method have a positive impact on the generalized zero-shot video classification.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have