Transferable Video Moment Localization by Moment-Guided Query Prompting

Hao Jiang,Yadong Mu,Yang Yizhang

doi:10.1609/aaai.v38i3.28028

Abstract

Video moment localization stands as a crucial task within the realm of computer vision, entailing the identification of temporal moments in untrimmed videos that bear semantic relevance to the supplied natural language queries. This work delves into a relatively unexplored facet of the task: the transferability of video moment localization models. This concern is addressed by evaluating moment localization models within a cross-domain transfer setting. In this setup, we curate multiple datasets distinguished by substantial domain gaps. The model undergoes training on one of these datasets, while validation and testing are executed using the remaining datasets. To confront the challenges inherent in this scenario, we draw inspiration from the recently introduced large-scale pre-trained vision-language models. Our focus is on exploring how the strategic utilization of these resources can bolster the capabilities of a model designed for video moment localization. Nevertheless, the distribution of language queries in video moment localization usually diverges from the text used by pre-trained models, exhibiting distinctions in aspects such as length, content, expression, and more. To mitigate the gap, this work proposes a Moment-Guided Query Prompting (MGQP) method for video moment localization. Our key idea is to generate multiple distinct and complementary prompt primitives through stratification of the original queries. Our approach is comprised of a prompt primitive constructor, a multimodal prompt refiner, and a holistic prompt incorporator. We carry out extensive experiments on Charades-STA, TACoS, DiDeMo, and YouCookII datasets, and investigate the efficacy of the proposed method using various pre-trained models, such as CLIP, ActionCLIP, CLIP4Clip, and VideoCLIP. The experimental results demonstrate the effectiveness of our proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Transferable Video Moment Localization by Moment-Guided Query Prompting

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

What Is the Intended Usage Context of This Model? An Exploratory Study of Pre-Trained Models on Various Model Repositories
Lina Gong ... Mingqiang Wei
ACM Transactions on Software Engineering and Methodology | VOL. 32
Lina Gong, et. al.Lina Gong ... Mingqiang Wei
03 May 2023
ACM Transactions on Software Engineering and Methodology | VOL. 32

Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets
Seid Muhie Yimam ... Gopalakrishnan Venkatesh
Future Internet | VOL. 13
Seid Muhie Yimam, et. al.Seid Muhie Yimam ... Gopalakrishnan Venkatesh
27 Oct 2021
Future Internet | VOL. 13

Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems
Fei Mi ... Lingjing Kong
-
Fei Mi, et. al.Fei Mi ... Lingjing Kong
01 Jan 2020
01 Jan 2020

Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems

-

21 Oct 2021
21 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transferable Video Moment Localization by Moment-Guided Query Prompting

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence