Action Recognition with Spatial-Temporal Representation Analysis Across Grassmannian Manifold and Euclidean Space

Xinshu Qiao,Chunyan Xu,Zhen Cui,Chuanwei Zhou,Jian Yang

doi:10.1109/icip.2018.8451548

Abstract

Action recognition plays an important character for numerous tasks of video area. Although previous works often learn the appearance and motion information with Convolutional Neural Networks (CNNs), they ignore the corresponding space structures of video representation. In this work, we address action recognition task with a Spatial-Temporal representation analysis algorithm Across Grassmannian manifold and Euclidean space (ST-AGE), which considers the appearance and motion information of video samples in an unified framework. For each video sample, we extract temporal features with classical CNNs (e.g., ConvNet, VGG, ResNet) and motion representation with the trajectory tracking method. Both spatial and temporal information can be then analyzed by embedding them on the Grassmannian manifold and Euclidean space, and an appropriate multi-kernel SVM is further conducted. Comprehensive evaluations on HMDB-51 and UCF-101 datasets demonstrate the significant superiority of STAGE over other state-of-the-art for human action recognition.

Full Text