Human interaction recognition fusing multiple features of depth sequences

Jianjun Li,Lan Wang,Xia Mao,Lijiang Chen

doi:10.1049/iet-cvi.2017.0025

Abstract

Human interaction recognition has played a major role in building intelligent video surveillance systems. Recently, depth data captured by the emerging RGB-D sensors began to show its importability in human interaction recognition. This study proposes a novel framework for human interaction recognition using depth information including an algorithm to reconstruct depth sequence with as few key frames as possible. The proposed framework includes two essential modules. First, key frames extraction by sparse constraint, then the fusion multi-feature, is constructed by using two types of available features and Max-pooling, respectively. Finally, multiple features are directly sent to the SVM for the recognition of the human activity. This study explores the static and dynamic feature fusion method to improve the recognition performance with contextual relevance of continuous frames. A weight is used to fuse shape and optical flow features, which not only enhance the description capability of human behavioural characteristics in the spatiotemporal domain, but also effectively reduces the adverse impact of certain distortion point of interest for target recognition. Experimental results show that the proposed approach yields considerable performance improvement over the state-of-the-art approaches with respect to accuracy on a public action dataset.

Full Text