Mask-guided multiscale feature aggregation network for hand gesture recognition

Hao Liang,Lunke Fei,Shuping Zhao,Jie Wen,Shaohua Teng,Yong Xu

doi:10.1016/j.patcog.2023.109901

Hao Liang, Lunke Fei + Show 4 more

https://doi.org/10.1016/j.patcog.2023.109901

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Hand gesture recognition from images is a longstanding computer vision task that can be used to build a potential bridge for human-computer interaction and sign language translation. For number of methods proposed for hand gesture recognition (HGR); however, difficult scenarios such as different scales of hand gestures and complex backgrounds exist, making them less effective. In this paper, we propose an end-to-end multiscale feature learning network for HGR, which consists of a CNN-based backbone network, a feature aggregation pyramid network (FAPN) embedded with a two-stage expansion-squeeze-aggregation (ESA) module, and three task-specific prediction branches. First, the backbone network extracts multiscale features from the original hand gesture images. Furthermore, the FAPN embedded with two-stage ESA extensively exploits multiscale feature information and learns hand gesture-specific features at different scales. Then, the mask loss guides the network to locate hand-specific regions during the training stage, and finally, the classification and regression branches output the category and location of a hand gesture during the model training and prediction. The experimental results on two publicly available datasets show that the proposed method outperforms most state-of-the-art HGR methods.

Full Text