MVHANet: multi-view hierarchical aggregation network for skeleton-based hand gesture recognition

Shaochen Li,Guifang Duan,Jianrong Tan,Zhenyu Liu

doi:10.1007/s11760-022-02469-9

Shaochen Li, Guifang Duan + Show 2 more

Open Access

https://doi.org/10.1007/s11760-022-02469-9

Copy DOI

Journal: Signal, image and video processing	Publication Date: Jan 21, 2023
Citations: 1	License type: cc-by

Affiliation: Zhejiang University

Abstract

Skeleton-based gesture recognition (SHGR) is a very challenging task due to the complex articulated topology of hands. Previous works often learn hand characteristics from a single observation viewpoint. However, the various context information hidden in multiple viewpoints is disregarded. To resolve this issue, we propose a novel multi-view hierarchical aggregation network for SHGR. Firstly, two-dimensional non-uniform spatial sampling, a novel strategy forming extrinsic parameter distributions of virtual cameras, is presented to enumerate viewpoints to observe hand skeletons. Afterward, we adopt coordinate transformation to generate multi-view hand skeletons and employ a multi-branch convolutional neural networks to further extract the multi-view features. Furthermore, we exploit a novel hierarchical aggregation network including hierarchical attention architecture and global context modeling to fuse the multi-view features for final classification. Experiments on three benchmarked datasets demonstrate that our work can be competitive with the state-of-the-art methods.

Full Text