Weakly Supervised Gaussian Networks for Action Detection

Basura Fernando,Cheston Tan Yin Chet,Hakan Bilen

doi:10.1109/wacv45572.2020.9093263

Basura Fernando, Cheston Tan Yin Chet + Show 1 more

Open Access

https://doi.org/10.1109/wacv45572.2020.9093263

Copy DOI

Abstract

Detecting temporal extents of human actions in videos is a challenging computer vision problem that requires detailed manual supervision including frame-level labels. This expensive annotation process limits deploying action detectors to a limited number of categories. We propose a novel method, called WSGN, that learns to detect actions from weak supervision, using only video-level labels. WSGN learns to exploit both video-specific and dataset-wide statistics to predict relevance of each frame to an action category. This strategy leads to significant gains in action detection for two standard benchmarks THU-MOS14 and Charades. Our method obtains excellent results compared to state-of-the-art methods that uses similar features and loss functions on THUMOS14 dataset. Similarly, our weakly supervised method is only 0.3% mAP behind a state-of-the-art supervised method on challenging Charades dataset for action localization.

Full Text