Abstract
Automatic human action recognition plays an important role in many real applications, such as video surveillance, virtual reality and human-computer intelligent interaction, etc. The spatial complexity and time variability are the main challenges to be addressed. Most of the traditional methods are based on handcrafting video features, which leads to limited expressive power and difficulties of generalization. In recent years, with the rise of deep network, the deep learning method is applied in automatic human action recognition and achieves better performanceIn this paper we present a novel Convolutional Neural Network (CNN) based automatic human action recognition method, which automatically learns the spatial and temporal characteristics of the data to improve the recognition performance. Specifically, we preprocess the dataset to extract keyframes by using interframe difference method to reduce data redundancy and preserve the spatiotemporal characteristics of the data simultaneously; then we utilize the real-time key point recognition system Openpose to get the skeleton information. It consists of human joint points which are the input features of our recognition model. For model training, we use the large data set of UCF-101 which is the common benchmark in this filed. For model evaluation, we compare our method with the state-of-the-art methods. The experimental results show that our method achieves significant performance improvement on the dataset of UCF-101. Finally, based on the model we implement a system by using a Kinect V2 to record human action in real environment. Our system can automatically mark the range of human action and output the corresponding action labels in real time.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have