Abstract

The task of video-query based video moment retrieval (VQ-VMR) aims to localize the segment in the reference video, which matches semantically with a short query video. This is a challenging task due to the rapid expansion and massive growth of online video services. With accurate retrieval of the target moment, we propose a new metric to effectively assess the semantic relevance between the query video and segments in the reference video. We also develop a new VQ-VMR framework to discover the intrinsic semantic relevance between a pair of input videos. It comprises two key components: a Fine-grained Feature Interaction (FFI) module and a Semantic Relevance Measurement (SRM) module. Together they can effectively deal with both the spatial and temporal dimensions of videos. First, the FFI module computes the semantic similarity between videos at a local frame level, mainly considering the spatial information in the videos. Subsequently, the SRM module learns the similarity between videos from a global perspective, taking into account the temporal information. We have conducted extensive experiments on two key datasets which demonstrate noticeable improvements of the proposed approach over the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call