Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding

Hao Zhou,Yan Luo,Chongyang Zhang,Wenjun Zhang,Chuanping Hu

doi:10.1109/tcsvt.2022.3179314

Abstract

Given a language query, temporal grounding task is to localize temporal boundaries of the described event in an untrimmed video. There is a long-standing challenge that multiple moments may be associated with one same video-query pair, termed label uncertainty. However, existing methods struggle to localize diverse moments due to the lack of multi-label annotations. In this paper, we propose a novel Diverse Temporal Grounding framework (DTG) to achieve diverse moment localization with only single-label annotations. By delving into the label uncertainty, we find the diverse moments retrieved tend to involve similar actions/objects, driving us to perceive these interest moments. Specifically, we construct soft multi-label through semantic similarity of multiple video-query pairs. These soft labels reveal whether multiple moments in the intra-videos contain similar verbs/nouns, thereby guiding interest moment generation. Meanwhile, we put forward a diverse moment regression network (DMRNet) to achieve multiple predictions in a single pass, where plausible moments are dynamically picked out from the interest moments for joint optimization. Moreover, we introduce new metrics that better reveal multi-output performance. Extensive experiments conducted on Charades-STA and ActivityNet Captions show that our method achieves state-of-the-art performance in terms of both standard and new metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: Oct 1, 2022
Citations: 7

Similar Papers

Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding
Hao Zhou ... Yanjun Chen
-
Hao Zhou, et. al.Hao Zhou ... Yanjun Chen
01 Jun 2021
01 Jun 2021

Unifying Event Detection and Captioning as Sequence Generation via Pre-training
Qi Zhang ... Qin Jin
-
Qi Zhang, et. al.Qi Zhang ... Qin Jin
01 Jan 2021
01 Jan 2021

Accelerated masked transformer for dense video captioning
Zhou Yu ... Nanjia Han
Neurocomputing | VOL. 445
Zhou Yu, et. al.Zhou Yu ... Nanjia Han
16 Mar 2021
Neurocomputing | VOL. 445

Video Moment Retrieval via Comprehensive Relation-Aware Network
Xin Sun ... Yizhe Zhu
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 33
Xin Sun, et. al.Xin Sun ... Yizhe Zhu
01 Sep 2023
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology