Multiple temporal scale aggregation graph convolutional network for skeleton-based action recognition

Xuanfeng Li,Jian Lu,Jian Zhou,Wei Liu,Kaibing Zhang

doi:10.1016/j.compeleceng.2023.108846

Abstract

Skeleton-based human action recognition is attracting increasing attention and is widely applied in virtual reality, human–computer interaction system and other cases. Nevertheless, the performance of recent methods on actions with similar appearance features is still barely satisfactory due to their inherent weakness in modeling the discriminative temporal dynamics. Besides, previous methods have limitation in mining the global information of action. To this end, a multiple temporal scale aggregation graph convolutional network is proposed. Firstly, taking advantage of the varying temporal resolutions offered by different layers in the graph convolutional network, we develop a multiple temporal scale aggregation module to extract discriminative temporal feature. Secondly, a new skeleton feature representation method termed as relative joint across frames is proposed, which provides more global structure clue than the absolute coordinates. Furthermore, we propose a five-stream structure that comprehensively models complementary features and eventually achieves a significant performance boost. In the empirical experiments, our method shows an improvement of 2.38% and 4.08% compared to the baseline method on the cross-subject evaluation benchmark of NTU-RGB+D 60 and NTU-RGB+D 120, respectively.

Full Text