Abstract

Hand gesture recognition from images is a longstanding computer vision task that can be used to build a potential bridge for human-computer interaction and sign language translation. For number of methods proposed for hand gesture recognition (HGR); however, difficult scenarios such as different scales of hand gestures and complex backgrounds exist, making them less effective. In this paper, we propose an end-to-end multiscale feature learning network for HGR, which consists of a CNN-based backbone network, a feature aggregation pyramid network (FAPN) embedded with a two-stage expansion-squeeze-aggregation (ESA) module, and three task-specific prediction branches. First, the backbone network extracts multiscale features from the original hand gesture images. Furthermore, the FAPN embedded with two-stage ESA extensively exploits multiscale feature information and learns hand gesture-specific features at different scales. Then, the mask loss guides the network to locate hand-specific regions during the training stage, and finally, the classification and regression branches output the category and location of a hand gesture during the model training and prediction. The experimental results on two publicly available datasets show that the proposed method outperforms most state-of-the-art HGR methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call