Abstract
Human action recognition (HAR) has been used in a variety of applications such as gaming, healthcare, surveillance, and robotics. Research on utilizing data such as color, depth, and skeletal data has been extensively conducted to achieve high-performance HAR. Compared with color and depth data, skeletal data are more compact, therefore, they are more efficient for computation and storage. Moreover, skeletal data are invariant to clothing textures, background, and lighting conditions. With the booming of deep learning, HAR has received a lot of attention. Spatial-Temporal Graph Convolution Networks (ST-GCN) have proved to be state-of-the-art architecture for HAR using skeleton data. However, this does not hold when working with challenging datasets that contain incomplete and noisy skeletal data. In this paper, a new method is proposed for HAR by adding a Feature Fusion module and applying hyperparameter optimization. The performance of the proposed method is evaluated on the challenging dataset CMDFALL and the newly-built MICA-Action3D dataset. Experimental results show that the proposed method significantly improves the performance of ST-GCN on these challenging datasets.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have