Deep Learning-Based Human Action Recognition with Key-Frames Sampling Using Ranking Methods

Nusrat Tasnim,Joong-Hwan Baek

doi:10.3390/app12094165

Abstract

Nowadays, the demand for human–machine or object interaction is growing tremendously owing to its diverse applications. The massive advancement in modern technology has greatly influenced researchers to adopt deep learning models in the fields of computer vision and image-processing, particularly human action recognition. Many methods have been developed to recognize human activity, which is limited to effectiveness, efficiency, and use of data modalities. Very few methods have used depth sequences in which they have introduced different encoding techniques to represent an action sequence into the spatial format called dynamic image. Then, they have used a 2D convolutional neural network (CNN) or traditional machine learning algorithms for action recognition. These methods are completely dependent on the effectiveness of the spatial representation. In this article, we propose a novel ranking-based approach to select key frames and adopt a 3D-CNN model for action classification. We directly use the raw sequence instead of generating the dynamic image. We investigate the recognition results with various levels of sampling to show the competency and robustness of the proposed system. We also examine the universality of the proposed method on three benchmark human action datasets: DHA (depth-included human action), MSR-Action3D (Microsoft Action 3D), and UTD-MHAD (University of Texas at Dallas Multimodal Human Action Dataset). The proposed method secures better performance than state-of-the-art techniques using depth sequences.

Full Text