Abstract

Human action recognition in videos has attracted many research interest in the past decade. As the RGB-D camera, the Kinect, has been invented, action recognition from depth videos has attracted lots of research interest in recent years. Many feature extraction methods have been proposed, including skeleton features and point cloud features. These rich features have their advantages and are complementary. In this work, we propose to combine three kinds of RGB-D features, including a local spatial-temporal feature (RGB), a skeleton joint feature and a point cloud feature, based on sparse coding to improve the action recognition performance. We adopt three schemes to combine features and to classify test samples by sparse coding. We carry out experiments to testify how much does each feature contribute to action recognition. In addition, the results show that the fusion of RGB-D features improves the performance, and outperforms the-state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.