Abstract
Weakly supervised temporal action localization (WTAL) aims to classify and localize actions in untrimmed videos with only video-level labels. Recent studies have attempted to obtain more accurate temporal boundaries by exploiting latent action instances in ambiguous snippets or propagating representative action features. However, empirically handcrafted ambiguous snippet extraction and the imprecise alignment of representative snippet propagation lead to challenges in modeling the completeness of actions for these methods. In this article, we propose a Discriminative Action Snippet Propagation Network (DASP-Net) to accurately discover ambiguous snippets in videos and propagate discriminative instance-level features throughout the video for improving action completeness. Specifically, we introduce a novel discriminative feature propagation module for capturing the global contextual attention and propagating the action concept across the whole video by perceiving the discriminative action snippets with instance information from the same video. Simultaneously, we incorporate denoised pseudo-labels as supervision, where we correct the controversial prediction based on the feature space distribution during training, thereby alleviating false detection caused by noise background features. Furthermore, we design an ambiguous feature mining module, which maximizes the feature affinity information of action and background in ambiguous snippets to generate more accurate latent action and background snippets and learns more precise action instance boundaries through contrastive learning of action and background snippets. Extensive experiments show that DASP-Net achieves state-of-the-art results on THUMOS14 and ActivityNet1.2 datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Multimedia Computing, Communications, and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.