MRN: Moment Relation Network for Natural Language Video Localization with Transfer Learning

Siyu Jiang,Guobin Wu

doi:10.1142/s0218001421520091

Abstract

In this paper, we tackle the task of natural language video localization (NLVL): given an untrimmed video and a description language query, the goal is to localize the temporal segment within the video that best describes the natural language description. NLVL is challenging at the intersection of language and video understanding because a video may contain multiple segments of interests and the language may describe complicated temporal dependencies. Though existing approaches have achieved good performance, most of them did not fully consider the inherent differences between language and video modalities. Here, we propose Moment Relation Network (MRN) to reduce the divergence of the probability distribution of these two modalities. Specifically, MRN trains video and language subnets, and then uses transfer learning techniques to map the extracted features into an embedding-shared space where we calculate the similarity of two modalities using Mahalanobis distance metric, which is used to localize moments. Extensive experiments on benchmark datasets show that the proposed MRN significantly outperforms the state-of-the-art under the widely used metrics by a large margin.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MRN: Moment Relation Network for Natural Language Video Localization with Transfer Learning

Abstract

Talk to us

Similar Papers

More From: International Journal of Pattern Recognition and Artificial Intelligence

Lead the way for us

Journal: International Journal of Pattern Recognition and Artificial Intelligence	Publication Date: Feb 6, 2021
Citations: 2

Similar Papers

Localizing Moments in Video with Temporal Language
Bryan Russell ... Eli Shechtman
-
Bryan Russell, et. al.Bryan Russell ... Eli Shechtman
01 Jan 2018
01 Jan 2018

Jointly Learning the Attributes and Composition of Shots for Boundary Detection in Videos
Xuekun Jiang ... Dahua Lin
IEEE Transactions on Multimedia | VOL. 24
Xuekun Jiang, et. al.Xuekun Jiang ... Dahua Lin
01 Jan 2021
IEEE Transactions on Multimedia | VOL. 24

Multi-Scale 2D Temporal Adjacency Networks for Moment Localization With Natural Language.
Houwen Peng ... Jianlong Fu
IEEE transactions on pattern analysis and machine intelligence | VOL. 44
Houwen Peng, et. al.Houwen Peng ... Jianlong Fu
01 Dec 2022
IEEE transactions on pattern analysis and machine intelligence | VOL. 44

A Unified Framework for Metric Transfer Learning
Hengjie Song ... Sinno Jialin Pan
IEEE Transactions on Knowledge and Data Engineering | VOL. 29
Hengjie Song, et. al.Hengjie Song ... Sinno Jialin Pan
01 Jun 2017
IEEE Transactions on Knowledge and Data Engineering | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MRN: Moment Relation Network for Natural Language Video Localization with Transfer Learning

Abstract

Talk to us

Similar Papers

More From: International Journal of Pattern Recognition and Artificial Intelligence