TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection

Hao Sun,Wenjing Chen,Mingyao Zhou,Wei Xie

doi:10.1609/aaai.v38i5.28304

Abstract

Video moment retrieval (MR) and highlight detection (HD) based on natural language queries are two highly related tasks, which aim to obtain relevant moments within videos and highlight scores of each video clip. Recently, several methods have been devoted to building DETR-based networks to solve both MR and HD jointly. These methods simply add two separate task heads after multi-modal feature extraction and feature interaction, achieving good performance. Nevertheless, these approaches underutilize the reciprocal relationship between two tasks. In this paper, we propose a task-reciprocal transformer based on DETR (TR-DETR) that focuses on exploring the inherent reciprocity between MR and HD. Specifically, a local-global multi-modal alignment module is first built to align features from diverse modalities into a shared latent space. Subsequently, a visual feature refinement is designed to eliminate query-irrelevant information from visual features for modal interaction. Finally, a task cooperation module is constructed to refine the retrieval pipeline and the highlight score prediction process by utilizing the reciprocity between MR and HD. Comprehensive experiments on QVHighlights, Charades-STA and TVSum datasets demonstrate that TR-DETR outperforms existing state-of-the-art methods. Codes are available at https://github.com/mingyao1120/TR-DETR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 4

Similar Papers

Cascaded MPN: Cascaded Moment Proposal Network for Video Corpus Moment Retrieval
Sunjae Yoon ... Junyeong Kim
IEEE Access | VOL. 10
Sunjae Yoon, et. al.Sunjae Yoon ... Junyeong Kim
01 Jan 2021
IEEE Access | VOL. 10

Multiple cross-attention for video-subtitle moment retrieval
Hao Fu ... Hongxing Wang
Pattern Recognition Letters | VOL. 156
Hao Fu, et. al.Hao Fu ... Hongxing Wang
01 Apr 2022
Pattern Recognition Letters | VOL. 156

Learning Video Moment Retrieval Without a Single Annotated Video
Junyu Gao ... Changsheng Xu
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 32
Junyu Gao, et. al.Junyu Gao ... Changsheng Xu
01 Mar 2022
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 32

Weakly-Supervised Video Moment Retrieval via Semantic Completion Network
Zhijie Lin ... Qi Wang
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Zhijie Lin, et. al.Zhijie Lin ... Qi Wang
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence