Combining RGB and depth features for action recognition based on sparse representation

Yan Song,Yang Lin

doi:10.1145/2808492.2808541

Abstract

Human action recognition in videos has attracted many research interest in the past decade. As the RGB-D camera, the Kinect, has been invented, action recognition from depth videos has attracted lots of research interest in recent years. Many feature extraction methods have been proposed, including skeleton features and point cloud features. These rich features have their advantages and are complementary. In this work, we propose to combine three kinds of RGB-D features, including a local spatial-temporal feature (RGB), a skeleton joint feature and a point cloud feature, based on sparse coding to improve the action recognition performance. We adopt three schemes to combine features and to classify test samples by sparse coding. We carry out experiments to testify how much does each feature contribute to action recognition. In addition, the results show that the fusion of RGB-D features improves the performance, and outperforms the-state-of-the-art methods.

Full Text