Abstract

Generating the discriminative representations of video clips is of vital importance for human action recognition, especially for complex action scenarios. In this paper, we particularly introduce Overcomplete Independent Component Analysis (OICA) to directly learn structural spatio-temporal features from the raw video data. OICA as an unsupervised learning method can fully exploit the unlabeled videos, which is crucial for action recognition since labeling huge volume of video data is too effort-consumed in practice. In addition, features learned by OICA can more accurately describe the complex actions with enough details owing to the overcompleteness and independence constraints to the component bases. Furthermore, inspired by the layered structure of deep neural network, we also propose to stack OICA to form a two-layer network for abstracting robust high-level features. Such stacking is practically proved effective for boosting the recognition accuracy. We evaluate the proposed stacked OICA network on four benchmark datasets: Hollywood2, YouTube, UCF Sports and KTH, which cover the simple and complex action scenarios. The experimental results show that our method always outperforms the baselines, and achieves the state-of-the-art performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.