Weakly supervised moment localization with natural language based on semantic reconstruction

Tingting Han,Jianping Fan,Jun Yu,Kai Wang

doi:10.1016/j.imavis.2022.104532

Abstract

The goal of cross-modal moment localization is to find the temporal moment in the untrimmed video that semantically corresponds to the natural language query. The majority of current approaches learn the cross-modal moment localization models from fine-grained temporal annotations in the video, which are extremely time-consuming and labor-intensive to obtain. In this paper, we offer a novel framework for weakly supervised cross-modal moment localization that incorporates a proposal generation module and a semantic reconstruction module. The proposal generation module uses a two-dimensional temporal feature map to model cross-modal video representations and can encode the moment-by-moment temporal relationships of moment candidates. The semantic reconstruction module, which is based on the generated proposals, assesses a proposal's capacity to restore the text query and provides weak supervision for network training. Besides, a punishment loss is also proposed to further eliminate the effect of the invalid area. Extensive experimental results show that the proposed method achieves state-of-the-art performance, demonstrating its effectiveness for weakly supervised moment localization with natural language. • A new framework is proposed for weakly supervised cross-modal moment localization. • A multi-task loss is designed for the weakly supervised optimizing of the network. • The proposal generation module could exploit moment-wise temporal relationships. • Extensive experimental results demonstrate the effectiveness of our method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Weakly supervised moment localization with natural language based on semantic reconstruction

Abstract

Talk to us

Similar Papers

More From: Image and Vision Computing

Lead the way for us

Journal: Image and Vision Computing	Publication Date: Oct 1, 2022
Citations: 2

Similar Papers

Automatic diagnosis of macular diseases from OCT volume based on its two-dimensional feature map and convolutional neural network with attention mechanism.
Yankui Sun ... Xianlin Yao
Journal of Biomedical Optics | VOL. 25
Yankui Sun, et. al.Yankui Sun ... Xianlin Yao
16 Sep 2020
Journal of Biomedical Optics | VOL. 25

Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
Songyang Zhang ... Jiebo Luo
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Songyang Zhang, et. al.Songyang Zhang ... Jiebo Luo
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Multi-Scale 2D Temporal Adjacency Networks for Moment Localization With Natural Language.
Songyang Zhang ... Houwen Peng
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44
Songyang Zhang, et. al.Songyang Zhang ... Houwen Peng
01 Dec 2022
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44

Context-aware network with foreground recalibration for grounding natural language in video
Cheng Chen ... Xiaodong Gu
Neural Computing and Applications | VOL. 33
Cheng Chen, et. al.Cheng Chen ... Xiaodong Gu
26 Feb 2021
Neural Computing and Applications | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Weakly supervised moment localization with natural language based on semantic reconstruction

Abstract

Talk to us

Similar Papers

More From: Image and Vision Computing