Abstract

Human action recognition using 3D pose data has gained a growing interest in the field of computer robotic interfaces and pattern recognition since the availability of hardware to capture human pose. In this paper, we propose a fast, simple, and powerful method of human action recognition based on human kinematic similarity. The key to this method is that the action descriptor consists of joints position, angular velocity and angular acceleration, which can meet the different individual sizes and eliminate the complex normalization. The angular parameters of joints within a short sliding time window (approximately 5 frames) around the current frame are used to express each pose frame of human action sequence. Moreover, three modified KNN (k-nearest-neighbors algorithm) classifiers are employed in our method: one for achieving the confidence of every frame in the training step, one for estimating the frame label of each descriptor, and one for classifying actions. Additional estimating of the frame’s time label makes it possible to address single input frames. This approach can be used on difficult, unsegmented sequences. The proposed method is efficient and can be run in real time. The research shows that many public datasets are irregularly segmented, and a simple method is provided to regularize the datasets. The approach is tested on some challenging datasets such as MSR-Action3D, MSRDailyActivity3D, and UTD-MHAD. The results indicate our method achieves a higher accuracy.

Highlights

  • Human action recognition is always an active research topic in recent years [1]

  • We proposed a novel angular spatio-temporal descriptor for human action recognition

  • We consider each frame in sequence as independent and put forward a classifier to estimate each frame’s time label, making the action recognition in real time

Read more

Summary

Introduction

Human action recognition is always an active research topic in recent years [1]. Most previous research on human action recognition are performed on conventional 2D color maps or sequences, many of them based on both global and local spatial-tempo feature [2,3,4]. Human action recognition based on kinematic similarity in real time action recognition based on color camera and depth sensors These methods can be roughly divided into two categories by feature source. In [10], the authors put forward Space-Time Occupancy Patterns (STOP) feature to present actions from depth maps These methods are mostly proposed in the early stage of human actions recognition field. Vivek Verriah et al [24] proposed differential Recurrent Neural Network (dRNN) based on Long ShortTerm Memory network (LSTM) They claimed that their method can classify any time-series data whatever real-word 2D or 3D human action data. The paper offers several contributions: First, the angular spatio-temporal descriptor is proposed, which contains the pose of current frame, and combines kinematic parameters using a short time window around the current frame to represent actions.

Method
Method overview
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call