Multi-Level interaction network for temporal sentence grounding in videos

Guangli Wu,Jing Zhang,Zhijun Yang

doi:10.3233/jifs-234800

Abstract

Temporal sentence grounding in videos (TSGV), which aims to retrieve video segments from an untrimmed videos that semantically match a given query. Most previous methods focused on learning either local or global query features and then performed cross-modal interaction, but ignore the complementarity between local and global features. In this paper, we propose a novel Multi-Level Interaction Network for Temporal Sentence Grounding in Videos. This network explores the semantics of queries at both phrase and sentence levels, interacting phrase-level features with video features to highlight video segments relevant to the query phrase and sentence-level features with video features to learn more about global localization information. A stacked fusion gate module is designed, which effectively captures the temporal relationships and semantic information among video segments. This module also introduces a gating mechanism to enable the model to adaptively regulate the fusion degree of video features and query features, further improving the accuracy of predicting the target segments. Extensive experiments on the ActivityNet Captions and Charades-STA benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-Level interaction network for temporal sentence grounding in videos

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent & Fuzzy Systems

Lead the way for us

Similar Papers

Joint Coding of Local and Global Deep Features in Videos for Visual Search.
Lin Ding ... Yonghong Tian
IEEE Transactions on Image Processing | VOL. 29
Lin Ding, et. al.Lin Ding ... Yonghong Tian
01 Jan 2020
IEEE Transactions on Image Processing | VOL. 29

Author response: A connectomics-based taxonomy of mammals
Laura E Suarez ... Yossi Yovel
-
Laura E Suarez, et. al.Laura E Suarez ... Yossi Yovel
10 Oct 2022
10 Oct 2022

Face alignment using a deep neural network with local feature learning and recurrent regression
Byung-Hwa Park ... Se-Young Oh
Expert Systems with Applications | VOL. 89
Byung-Hwa Park, et. al.Byung-Hwa Park ... Se-Young Oh
13 Jul 2017
Expert Systems with Applications | VOL. 89

Global–local feature learning for fine-grained food classification based on Swin Transformer
Jun-Hwa Kim ... Chee Sun Won
Engineering Applications of Artificial Intelligence | VOL. 133
Jun-Hwa Kim, et. al.Jun-Hwa Kim ... Chee Sun Won
15 Mar 2024
Engineering Applications of Artificial Intelligence | VOL. 133

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Level interaction network for temporal sentence grounding in videos

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent &amp; Fuzzy Systems

More From: Journal of Intelligent & Fuzzy Systems