Temporally Grounding Language Queries in Videos by Contextual Boundary-Aware Prediction

Jingwen Wang,Lin Ma,Wenhao Jiang

doi:10.1609/aaai.v34i07.6897

Abstract

The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence). It requires certain models to simultaneously perform visual and linguistic understandings. Previous work predominantly ignores the precision of segment localization. Sliding window based methods use predefined search window sizes, which suffer from redundant computation, while existing anchor-based approaches fail to yield precise localization. We address this issue by proposing an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information. To better detect semantic boundaries, we propose to aggregate contextual information by explicitly modeling the relationship between the current element and its neighbors. The most confident segments are subsequently selected based on both anchor and boundary predictions at the testing stage. The proposed model, dubbed Contextual Boundary-aware Prediction (CBP), outperforms its competitors with a clear margin on three public datasets.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Temporally Grounding Language Queries in Videos by Contextual Boundary-Aware Prediction

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 136

Similar Papers

EFFECTIVE PRACTICES FOR DEVELOPING THE LITERACY SKILLS OF ENGLISH LANGUAGE LEARNERS IN THE ENGLISH LANGUAGE ARTS CLASSROOM
Sultan Turkan ... Jerome Bicknell
ETS Research Report Series | VOL. 2012
Sultan Turkan, et. al.Sultan Turkan ... Jerome Bicknell
01 Jun 2012
ETS Research Report Series | VOL. 2012

On beyond Zebra: The relation of linguistic and visual information
Ray Jackendoff
Cognition | VOL. 26
Ray JackendoffRay Jackendoff
01 Jul 1987
Cognition | VOL. 26

FVP: Fourier Visual Prompting for Source-Free Unsupervised Domain Adaptation of Medical Image Segmentation.
Yan Wang ... Zhenzhou Wu
IEEE Transactions on Medical Imaging | VOL. 42
Yan Wang, et. al.Yan Wang ... Zhenzhou Wu
01 Dec 2023
IEEE Transactions on Medical Imaging | VOL. 42

Language of Political Campaign: Unraveling the Linguistic Landscape in Billboard Advertisements
Naniana N Benu ... Ni Made Suryati
Feb-March 2024 | VOL. -
Naniana N Benu, et. al.Naniana N Benu ... Ni Made Suryati
29 Mar 2024
Feb-March 2024 | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Temporally Grounding Language Queries in Videos by Contextual Boundary-Aware Prediction

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence