Abstract
In this paper, we introduce a new human action recognition framework which consists of feature extract, feature coding, network tuning, and action recognition. Spatial–temporal combined features are designed to represent human actions in both space and time, a method which results in higher accuracy than when individual spatial or temporal features are used. A new layer is added to the Deep Belief Network (DBN), which is used to unify the length of the spatial–temporal features. In the training stage, we suggest some improved methods, including model initialization and a fine-tuning of the DBN’s parameters. We evaluate our framework using the MSR-Action 3D public datasets. Our experiments indicate that spatial–temporal features can achieve an accuracy of 93.4%, an improvement of 1.09% and 11.71% when compared with the uncombined temporal and spatial features, respectively.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have