Abstract

In this paper, we introduce a new human action recognition framework which consists of feature extract, feature coding, network tuning, and action recognition. Spatial–temporal combined features are designed to represent human actions in both space and time, a method which results in higher accuracy than when individual spatial or temporal features are used. A new layer is added to the Deep Belief Network (DBN), which is used to unify the length of the spatial–temporal features. In the training stage, we suggest some improved methods, including model initialization and a fine-tuning of the DBN’s parameters. We evaluate our framework using the MSR-Action 3D public datasets. Our experiments indicate that spatial–temporal features can achieve an accuracy of 93.4%, an improvement of 1.09% and 11.71% when compared with the uncombined temporal and spatial features, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.