Human Action Recognition Based on Skeleton and Convolutional Neural Network

Yusi Yang,Lan Lin,Zhuohao Cai,Tong Wu,Yingdong Yu

doi:10.1109/piers-fall48861.2019.9021648

Yusi Yang, Lan Lin + Show 3 more

https://doi.org/10.1109/piers-fall48861.2019.9021648

Copy DOI

Export

Save

Cite

Publication Date: Dec 1, 2019

Citations: 7

Affiliation: Tongji University

Abstract
Full-Text
Similar Papers

Abstract

Listen

Automatic human action recognition plays an important role in many real applications, such as video surveillance, virtual reality and human-computer intelligent interaction, etc. The spatial complexity and time variability are the main challenges to be addressed. Most of the traditional methods are based on handcrafting video features, which leads to limited expressive power and difficulties of generalization. In recent years, with the rise of deep network, the deep learning method is applied in automatic human action recognition and achieves better performanceIn this paper we present a novel Convolutional Neural Network (CNN) based automatic human action recognition method, which automatically learns the spatial and temporal characteristics of the data to improve the recognition performance. Specifically, we preprocess the dataset to extract keyframes by using interframe difference method to reduce data redundancy and preserve the spatiotemporal characteristics of the data simultaneously; then we utilize the real-time key point recognition system Openpose to get the skeleton information. It consists of human joint points which are the input features of our recognition model. For model training, we use the large data set of UCF-101 which is the common benchmark in this filed. For model evaluation, we compare our method with the state-of-the-art methods. The experimental results show that our method achieves significant performance improvement on the dataset of UCF-101. Finally, based on the model we implement a system by using a Kinect V2 to record human action in real environment. Our system can automatically mark the range of human action and output the corresponding action labels in real time.

Full Text