Adaptive proposal network based on generative adversarial learning for weakly supervised temporal sentence grounding

Weikang Wang,Yuting Su,Jing Liu,Peiguang Jing

doi:10.1016/j.patrec.2024.01.018

Abstract

Temporal sentence grounding aims to locate the moment most related to the given natural language query. Noticing the time-consuming labeling process of the temporal bounding boxes, recent works started to focus on the weakly supervised temporal sentence grounding (WTSG) with only video-text pairwise annotations. Existing WTSG methods mainly adopted anchor-based structure to generate moment candidates and trained the network with triplet loss between the positive and negative samples, and thus the network performance was seriously affected by the preset anchors and the loss margin. In this paper, we propose a novel contrastive generative adversarial learning method with adaptive box generation for WTSG tasks. Specifically, the temporal proposals are adaptively generated by a transformer-based box generator in a complete anchor-free manner. And a novel contrastive generative adversarial learning process is proposed for the network optimization, which can effectively encourages the separation of the positive and negative samples without preset margin value. Extensive experiments indicate that our method achieves the state-of-the-art performance on both of the Charades-STA and the ActivityNet Captions datasets.

Full Text