Text-Based Localization of Moments in a Video Corpus.

Sudipta Paul,Amit K Roy-Chowdhury,Niluthpol Chowdhury Mithun

doi:10.1109/tip.2021.3120038

Sudipta Paul, Amit K Roy-Chowdhury + Show 1 more

Open Access

https://doi.org/10.1109/tip.2021.3120038

Copy DOI

Abstract

Prior works on text-based video moment localization focus on temporally grounding the textual query in an untrimmed video. These works assume that the relevant video is already known and attempt to localize the moment on that relevant video only. Different from such works, we relax this assumption and address the task of localizing moments in a corpus of videos for a given sentence query. This task poses a unique challenge as the system is required to perform: 2) retrieval of the relevant video where only a segment of the video corresponds with the queried sentence, 2) temporal localization of moment in the relevant video based on sentence query. Towards overcoming this challenge, we propose Hierarchical Moment Alignment Network (HMAN) which learns an effective joint embedding space for moments and sentences. In addition to learning subtle differences between intra-video moments, HMAN focuses on distinguishing inter-video global semantic concepts based on sentence queries. Qualitative and quantitative results on three benchmark text-based video moment retrieval datasets - Charades-STA, DiDeMo, and ActivityNet Captions - demonstrate that our method achieves promising performance on the proposed task of temporal localization of moments in a corpus of videos.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Image Processing	Publication Date: Jan 1, 2021
Citations: 12	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Text-Based Localization of Moments in a Video Corpus.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing

Lead the way for us

Similar Papers

STCM-Net: A symmetrical one-stage network for temporal language localization in videos
Zixi Jia ... Chunbo Li
Neurocomputing | VOL. 471
Zixi Jia, et. al.Zixi Jia ... Chunbo Li
16 Nov 2021
Neurocomputing | VOL. 471

Exploiting Auxiliary Caption for Video Grounding
Hongxiang Li ... Yaowei Li
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Hongxiang Li, et. al.Hongxiang Li ... Yaowei Li
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
Songyang Zhang ... Jiebo Luo
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Songyang Zhang, et. al.Songyang Zhang ... Jiebo Luo
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding
Daizong Liu ... Zichuan Xu
-
Daizong Liu, et. al.Daizong Liu ... Zichuan Xu
01 Jun 2021
01 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Text-Based Localization of Moments in a Video Corpus.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing