Abstract

Action recognition plays an important character for numerous tasks of video area. Although previous works often learn the appearance and motion information with Convolutional Neural Networks (CNNs), they ignore the corresponding space structures of video representation. In this work, we address action recognition task with a Spatial-Temporal representation analysis algorithm Across Grassmannian manifold and Euclidean space (ST-AGE), which considers the appearance and motion information of video samples in an unified framework. For each video sample, we extract temporal features with classical CNNs (e.g., ConvNet, VGG, ResNet) and motion representation with the trajectory tracking method. Both spatial and temporal information can be then analyzed by embedding them on the Grassmannian manifold and Euclidean space, and an appropriate multi-kernel SVM is further conducted. Comprehensive evaluations on HMDB-51 and UCF-101 datasets demonstrate the significant superiority of STAGE over other state-of-the-art for human action recognition.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.