Enhancing Zero-Shot Action Recognition in Videos by Combining GANs with Text and Images

Kaiqiang Huang,Luis Miralles-Pechuán,Susan Mckeever

doi:10.1007/s42979-023-01803-3

Kaiqiang Huang, Luis Miralles-Pechuán + Show 1 more

Open Access

https://doi.org/10.1007/s42979-023-01803-3

Copy DOI

Journal: SN Computer Science	Publication Date: May 5, 2023
Citations: 4	License type: open-access

Affiliation: Technological University Dublin

Abstract

Zero-shot action recognition (ZSAR) tackles the problem of recognising actions that have not been seen by the model during the training phase. Various techniques have been used to achieve ZSAR in the field of human action recognition (HAR) in videos. Techniques based on generative adversarial networks (GANs) are the most promising in terms of performance. GANs are trained to generate representations of unseen videos conditioned on information related to the unseen classes, such as class label embeddings. In this paper, we present an approach based on combining information from two different GANs, both of which generate a visual representation of unseen classes. Our dual-GAN approach leverages two separate knowledge sources related to the unseen classes: class-label texts and images related to the class label obtained from Google Images. The generated visual embeddings of the unseen classes by the two GANs are merged and used to train a classifier in a supervised-learning fashion for ZSAR classification. Our methodology is based on the idea that using more and richer knowledge sources to generate unseen classes representations will lead to higher downstream accuracy when classifying unseen classes. The experimental results show that our dual-GAN approach outperforms state-of-the-art methods on the two benchmark HAR datasets: HMDB51 and UCF101. Additionally, we present a comprehensive discussion and analysis of the experimental results for both datasets to understand the nuances of each approach at a class level. Finally, we examine the impact of the number of visual embeddings generated by the two GANs on the accuracy of the models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing Zero-Shot Action Recognition in Videos by Combining GANs with Text and Images

Abstract

Talk to us

Similar Papers

More From: SN Computer Science

Lead the way for us

Similar Papers

Human action recognition in surveillance video of a computer laboratory
Abdul-Lateef Yussiff ... Yong Suet-Peng
-
Abdul-Lateef Yussiff, et. al.Abdul-Lateef Yussiff ... Yong Suet-Peng
01 Aug 2016
01 Aug 2016

Action Recognition in Untrimmed Videos with Composite Self-attention Two-Stream Framework
Dong Cao ... Lisha Xu
-
Dong Cao, et. al.Dong Cao ... Lisha Xu
01 Jan 2020
01 Jan 2020

Understanding action recognition in still images
Deeptha Girish ... Anca Ralescu
-
Deeptha Girish, et. al.Deeptha Girish ... Anca Ralescu
01 Jun 2020
01 Jun 2020

Audio and Video Feature Fusion for Activity Recognition in Unconstrained Videos
José Lopes ... Sameer Singh
-
José Lopes, et. al.José Lopes ... Sameer Singh
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing Zero-Shot Action Recognition in Videos by Combining GANs with Text and Images

Abstract

Talk to us

Similar Papers

More From: SN Computer Science