Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zero-shot Classification and Retrieval of Videos

Kranti Kumar Parida,Tanaya Guha,Gaurav Sharma,Neeraj Matiyali

doi:10.1109/wacv45572.2020.9093438

Kranti Kumar Parida, Tanaya Guha + Show 2 more

Open Access

https://doi.org/10.1109/wacv45572.2020.9093438

Copy DOI

Abstract

We present an audio-visual multimodal approach for the task of zero-shot learning (ZSL) for classification and retrieval of videos. ZSL has been studied extensively in the recent past but has primarily been limited to visual modality and to images. We demonstrate that both audio and visual modalities are important for ZSL for videos. Since a dataset to study the task is currently not available, we also construct an appropriate multimodal dataset with 33 classes containing 156, 416 videos, from an existing large scale audio event dataset. We empirically show that the performance improves by adding audio modality for both tasks of zero-shot classification and retrieval, when using multi-modal extensions of embedding learning methods. We also propose a novel method to predict the ‘dominant’ modality using a jointly learned modality attention network. We learn the attention in a semi-supervised setting and thus do not require any additional explicit labelling for the modalities. We provide qualitative validation of the modality specific attention, which also successfully generalizes to unseen test classes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zero-shot Classification and Retrieval of Videos

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Mar 1, 2020
Citations: 58	License type: other-oa

Similar Papers

Motion Trajectory-Based Video Retrieval, Classification, and Summarization
Xiang Ma ... Xu Chen
-
Xiang Ma, et. al.Xiang Ma ... Xu Chen
01 Jan 2009
01 Jan 2009

Visual Similarity Using Limited Supervision

-

23 Feb 2021
23 Feb 2021

Generalized Zero-Shot Video Classification via Generative Adversarial Networks
Mingyao Hong ... Xinfeng Zhang
-
Mingyao Hong, et. al.Mingyao Hong ... Xinfeng Zhang
12 Oct 2020
12 Oct 2020

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings
Pratik Mazumder ... Pravendra Sing
-
Pratik Mazumder, et. al.Pratik Mazumder ... Pravendra Sing
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zero-shot Classification and Retrieval of Videos

Abstract

Talk to us

Similar Papers