Abstract

Temporal sentence grounding (TSG) is an important yet challenging task in video-based information retrieval. Given an untrimmed video input, it requires the machine to predict the interested video segment semantically related to a given sentence query. Most existing TSG methods train well-designed deep networks to align the semantic between video-query pairs for activity grounding with a large amount of data. However, we argue that these works easily capture the selection biases of video-query pairs in a dataset rather than showing the robust reasoning abilities to handle the rarely appeared pairs ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e</i> ., few-shot contents). To alleviate such limitation of the off-balance data distribution during the network training, in this paper, we propose a novel memory-augmented network called Memory-Guided Semantic Learning Network (MGSL-Net) to handle the few-shot TSG task for enhancing the model generalization ability. Specifically, given the matched video-query input, we first employ a graph attentive cross-modal interaction module to align their semantics in a cycle-consistent manner. Then, we develop the memory modules in both video and query domains to record the cross-modal shared semantic features in the domain-specific persistent memory. At last, a heterogeneous attention module is utilized to integrate the memory-enhanced multi-modal features in both video and query domains with further feature calibration. During training, the memory modules are dynamically associated with both common and rare cases to memorize all appeared contents, alleviating the issue of forgetting the few-shot contents. Therefore, in testing, the rare cases can be enhanced by retrieving the stored memories, improving the generalization ability of the model. Experimental results on three benchmarks (ActivityNet Caption, Charades-STA and TACoS) show the superiority of our method on both effectiveness and efficiency.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.