Action recognition using weighted fusion of depth images and skeleton’s key frames

Yan Xu,Zhenjie Hou,Liang Jia,Yi Song,Chen Chen,Jiuzhen Liang

doi:10.1007/s11042-019-7593-5

Abstract

This paper presents a new method for human action recognition fusing depth images and skeletal maps. Each depth image is represented by 2D and 3D auto-correlation of gradients features. A feature using spatial and orientational auto-correlation is extracted from depth images. Mutual information is used to define the similarity of each frame in the skeleton sequence, and then extract the key frames from the skeleton sequence. The skeleton feature extracted from the key frames as complementary features to cope with the temporal information loss in depth images. Each set of feature is used as input to two extreme learning machine classifiers and assign different weight to each set of feature. Using different classifier weights provides more flexible to different features. The final class label is determined according to the fused result. Experiments conducted on MSR_Action3D depth action data set show the accuracy of this proposed method is 1.5% higher than the state-of-the-art action recognition methods.

Full Text