Abstract

Human action recognition plays a key role in human-computer interaction in complex environments. However, similar actions will lead to poor feature sequence extraction and result in a reduction in recognition accuracy. This paper proposes a method (Action-Fusion: Multi-label subspace Learning (MLSL)) from depth maps called Depth Sequential Information Entropy Maps (DSIEM) and skeleton data for human action recognition in multiple modal features. The DSIEM describe the spatial information of human motion with information entropy, and describe the temporal information through stitching. DSIEM can reduce the redundancy of depth sequences and effectively capture spatial motion states. MLSL studies the relationship between different modalities and the inherent connection between different labels. The method is evaluated on three public datasets: Microsoft action 3D dataset (MSR Action3D), University of Texas at Dallas-multimodal human action dataset (UTD-MHAD), UTD MHAD- Kinect Version-2 (UTD-MHAD-Kinect V2). Experimental results show that the proposed MLSL model obtains new state-of-the-art results, including achieving the average rate of the MSR Action3D to 93.55%, the average rate of the UTD-MHAD to 88.37% and the average rate of the UTD-MHAD-Kinect V2 to 90.66%.

Highlights

  • Human action recognition aims to distinguish the types of human movements or actions in a video due to its important roles in a variety application prospects, for example medical health [1], video surveillance [2] and human robot interaction [3]

  • Great progress has been achieved in human action recognition, several challenges are still solved, such as varying light conditions and occlusions

  • In order to minimize the correlation between similar samples and maintain the correlation between different types of samples, Multi-label subspace Learning (MLSL) is formulated as follows: MH

Read more

Summary

INTRODUCTION

Human action recognition aims to distinguish the types of human movements or actions in a video due to its important roles in a variety application prospects, for example medical health [1], video surveillance [2] and human robot interaction [3]. Researchers introduced various human action recognition methods for Red-Green-Blue(RGB) videos. The most methods are based on the whole depth videos, which may lose the spatial and detail information of human action recognition. We first introduce a deep video feature extraction algorithm called Depth Sequential Information Entropy Maps (DSIEM). In addition to learning the projection matrices to map multimodal features to a common subspace, MLSL explores the inherent relationship among different class labels and finds distinguishable common features. This method fuses the two modalities of depth videos and skeleton sequences.

RELATED WORK
EXPERIMENTAL RESULTS
RECOGNITION PERFORMANCE EVALUATION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.