Abstract

Temporal action proposal generation aims to generate temporal video segments containing human actions in untrimmed videos, which is always a preliminary for such video understanding tasks as action localization and temporally description grounding, <i>etc</i>. Fully-supervised solutions, though proven to be effective, suffer much from heavy data annotation overhead. To address this problem, this paper focuses on a rarely investigated yet practical problem of semi-supervised learning for temporal action proposal generation. Firstly, we propose a <i>Proposal Map oriented Mean-Teacher</i> (PM-MT) model, which can use both labeled and unlabeled data for end-to-end model training. Secondly, a <i>Suppression-and-Re-Generation</i> (SRG) strategy is designed to generate high-quality pseudo labels for unlabeled data, which are then used to finetune the model. Extensive experiments demonstrate the effectiveness of our proposed method, by achieving the state-of-the-art results on two public benchmark datatsets on the task of semi-supervised action proposal generation and outperforming fully-supervised learning methods with only a portion of labeled data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call