Evaluation of local spatial–temporal features for cross-view action recognition

Zan Gao,Weizhi Nie,Anan Liu,Hua Zhang

doi:10.1016/j.neucom.2015.07.105

Abstract

Local spatial–temporal feature-based representation is extremely popular for human action recognition. Many spatial–temporal salient point detectors and descriptors have been proposed. Although the promising results have been achieved for action recognition recently, there still exist two severe problems: (1) there is lack of systematic evaluation of local spatial–temporal features on cross-view action recognition; (2) there is lack of a baseline method especially for the task of cross-view action recognition, which can adaptively bridge different feature spaces from multiple views for this cross-domain task. In this paper, we evaluate four popular spatial–temporal features (STIP, Cuboids, MoSIFT, HoG3D) with the framework of transferable dictionary pair learning. This framework can first learn one transferable dictionary pair in both unsupervised and supervised settings. Then, training samples in the source view and testing samples in the target view can be represented with corresponding source and target dictionary respectively to get sparse feature representations, which is used to training classifier for action recognition. In this way, it can map the features from different views into the same feature space to handle the cross-domain task. The evaluation of four spatial–temporal features and the framework of transferable dictionary pair learning are implemented on the popular multi-view human action dataset, IXMAS. The comparative experiments against the representative methods further demonstrate the superiority of this framework on cross-view human action recognition.

Full Text