Abstract

Human action recognition applications are greatly benefited from the use of commodity depth sensors that are capable of skeleton tracking. Some of these applications (e.g., customizable gesture interfaces) require learning of new actions at runtime and may not count with many training instances. This paper presents a human action recognition method designed for flexibility, which allows taking users’ feedback to improve recognition performance and to add a new action instance without computationally expensive optimization for training classifiers. Our nearest neighbor-based action classifier adopts dynamic time warping to handle variability in execution rate. In addition, it uses the confidence values associated to each tracked joint position to mask erroneous trajectories for robustness against noise. We evaluate the proposed method with various datasets with different frame rates, actors, and noise. The experimental results demonstrate its adequacy for learning of actions from depth sequences at runtime. We achieve an accuracy comparable to the state-of-the-art techniques on the challenging MSR-Action3D dataset.

Highlights

  • Human action recognition (HAR) attracts the attention of many researchers due to its numerous applications, such as video surveillance, human computer interaction, and video analysis [1]

  • By applying dynamic time warping (DTW), we gain robustness against variations in execution rates, which heavily affect HAR. This methodology is more sensitive to the noise present in the joint position estimation, we manage to effectively alleviate this problem by using the confidence values provided by the skeleton tracker itself

  • 5 Conclusions In this paper, we have presented a flexible method for recognizing actions from trajectories estimated from depth sequences based on the generation of action templates using joint trajectories

Read more

Summary

Introduction

Human action recognition (HAR) attracts the attention of many researchers due to its numerous applications, such as video surveillance, human computer interaction, and video analysis [1]. Providing a machine the ability to recognize human actions from an image sequence is a challenging task due to their large variability in various factors [2]. In [3], three main sources of variability are identified: viewpoint, execution rate/speed, and anthropometry. The recent commodification of depth sensors provides a way to reduce the variability using depth information [4]. They provide 3D structure of scenes, which facilitates the understanding of human actions under conditions in which 2D approaches may be ineffective (e.g., motion perpendicular to the camera plane). Depth sensors have opened a door for the development of novel

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call