Multi‐modal fusion method for human action recognition based on IALC

Yinhuan Zhang,Xing Liu,Yongquan Wei,Chaoqin Chu,Qinkun Xiao,Jingyun Xue

doi:10.1049/ipr2.12640

Yinhuan Zhang, Xing Liu + Show 4 more

Open Access

https://doi.org/10.1049/ipr2.12640

Copy DOI

Abstract

In occlusion and interaction scenarios, human action recognition (HAR) accuracy is low. To address this issue, this paper proposes a novel multi-modal fusion framework for HAR. In this framework, a module called improved attention long short-term memory (IAL) is proposed, which combines the improved SE-ResNet50 (ISE-ResNet50) with long short-term memory (LSTM). IAL can extract the video sequence features and the skeleton sequence features of human behaviour. To improve the performance of HAR at a high semantic level, the obtained multi-modal sequence features are fed into a couple hidden Markov model (CHMM), and a multi-modal IAL+CHMM method called IALC is developed based on a probability graph model. To test the performance of the proposed method, experiments are conducted on the HMDB51, UCF101, Kinetics 400k, and ActivityNet datasets, and the obtained recognition accuracy are 86.40%, 97.78%, 81.12%, and 69.36% on the four datasets, respectively. The experimental results show that when the environment is complex, the proposed multi-modal fusion method for HAR based on the IALC can achieve more accurate target recognition results.

Full Text