Commonsense for Zero-Shot Natural Language Video Localization

Meghana Holla,Ismini Lourentzou

doi:10.1609/aaai.v38i3.27989

Abstract

Zero-shot Natural Language-Video Localization (NLVL) methods have exhibited promising results in training NLVL models exclusively with raw video data by dynamically generating video segments and pseudo-query annotations. However, existing pseudo-queries often lack grounding in the source video, resulting in unstructured and disjointed content. In this paper, we investigate the effectiveness of commonsense reasoning in zero-shot NLVL. Specifically, we present CORONET, a zero-shot NLVL framework that leverages commonsense to bridge the gap between videos and generated pseudo-queries via a commonsense enhancement module. CORONET employs Graph Convolution Networks (GCN) to encode commonsense information extracted from a knowledge graph, conditioned on the video, and cross-attention mechanisms to enhance the encoded video and pseudo-query representations prior to localization. Through empirical evaluations on two benchmark datasets, we demonstrate that CORONET surpasses both zero-shot and weakly supervised baselines, achieving improvements up to 32.13% across various recall thresholds and up to 6.33% in mIoU. These results underscore the significance of leveraging commonsense reasoning for zero-shot NLVL.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Commonsense for Zero-Shot Natural Language Video Localization

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Improved Collaborative Recommendation Model: Integrating Knowledge Embedding and Graph Contrastive Learning
Liwei Jiang ... Guanghui Yan
Electronics | VOL. 12
Liwei Jiang, et. al.Liwei Jiang ... Guanghui Yan
13 Oct 2023
Electronics | VOL. 12

An Efficient Recommendation Algorithm Integrating Knowledge Graph with Graph Convolutional Networks
Changzheng Xing ... Jialong Guo
-
Changzheng Xing, et. al.Changzheng Xing ... Jialong Guo
01 Feb 2023
01 Feb 2023

Aspect-level sentiment analysis merged with knowledge graph and graph convolutional neural network
Zuhua Dai ... Shilong Di
Journal of Physics: Conference Series | VOL. 2083
Zuhua Dai, et. al.Zuhua Dai ... Shilong Di
01 Nov 2021
Journal of Physics: Conference Series | VOL. 2083

Learnable convolutional attention network for knowledge graph completion
Bin Shang ... Jun Liu
Knowledge-Based Systems | VOL. 285
Bin Shang, et. al.Bin Shang ... Jun Liu
03 Jan 2024
Knowledge-Based Systems | VOL. 285

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Commonsense for Zero-Shot Natural Language Video Localization

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence