Abstract

Recently, the rapid development of inexpensive RGB-D sensor, like Microsoft Kinect, provides adequate information for human action recognition. In this paper, a recognition algorithm is presented in which feature representation is generated by concatenating spatial features from human contour of key frames and temporal features from time difference information of a sequence. Then, an improved multi-hidden layers extreme learning machine is introduced as classifier. At last, we test our scheme on the public UTD-MHAD dataset from recognition accuracy and time consumption.

Highlights

  • Action recognition has been a hot research topic due to its wild range of applications in many areas, such as intelligent video surveillance, smart living and human-computer interaction [1]-[3]

  • Hsu et al [16] introduced a new scheme by producing SpatioTemporal Matrix Intensity (STMI) from raw RGB and SpatioTemporal Matrix Depth (STMD) images from depth images respectively

  • HoG and HoF features were generated by constructing BoW-Pyramids, which made the classification of reversed actions become possible, such as from sit to stand and from stand to sit

Read more

Summary

INTRODUCTION

Action recognition has been a hot research topic due to its wild range of applications in many areas, such as intelligent video surveillance, smart living and human-computer interaction [1]-[3]. Compared with traditional color sequence, depth sequence is invariant and stable to the illumination and body appearance It provides body structure and shape information for action classification. Unlike using information only from depth sequences, there are some methods combing multiple information to do action recognition, such as color data, skeleton data and depth maps. Two feature representation methods are introduced for action classification. Hsu et al [16] introduced a new scheme by producing SpatioTemporal Matrix Intensity (STMI) from raw RGB and SpatioTemporal Matrix Depth (STMD) images from depth images respectively This method was demonstrated to be viewinvariant.

KEY FRAMES EXTRACTION
FEATURE REPRESENTATION
ACTION RECOGNITION
UTD-MHAD Dataset and Tests Setting
Comparison with Other Methods
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.