Abstract

Action localization is a central yet challenging task for video analysis. Most existing methods rely heavily on the supervised learning where the action label for each frame should be given beforehand. Unfortunately, for many real applications, it is often costly and source-consuming to obtain frame-level action labels for untrimmed videos. In this paper, a novel two-stage paradigm where only the video-level action labels are required, is proposed for weakly supervised action localization. To this end, an Image-to-Video (I2V) network is firstly developed to transfer the knowledge learned from the image domain (e.g. ImageNet) to the specific video domain. Relying on the model learned from I2V network, a Video-to-Proposal (V2P) network is further designed to identify action proposals without the need of temporal annotations. Lastly, a proposal selection layer is devised on the top of the V2P network to choose the maximal proposal response along each class subject and thus obtain a video-level prediction score. By minimizing the difference between the prediction score and video-level label, we fine-tune our V2P network to learn enhanced discriminative ability on classifying proposal inputs. Extensive experimental results show that our method outperforms the state-of-the-art approaches on ActivityNet1.2 and the mAP is improved from 13.7% to 16.2% on THUMOS14. More importantly, even with weak supervision, our networks attain comparable accuracy to those employing strong supervision, thus demonstrating the effectiveness of our method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.