Abstract

Hand gesture recognition from images is a longstanding computer vision task that can be used to build a potential bridge for human-computer interaction and sign language translation. For number of methods proposed for hand gesture recognition (HGR); however, difficult scenarios such as different scales of hand gestures and complex backgrounds exist, making them less effective. In this paper, we propose an end-to-end multiscale feature learning network for HGR, which consists of a CNN-based backbone network, a feature aggregation pyramid network (FAPN) embedded with a two-stage expansion-squeeze-aggregation (ESA) module, and three task-specific prediction branches. First, the backbone network extracts multiscale features from the original hand gesture images. Furthermore, the FAPN embedded with two-stage ESA extensively exploits multiscale feature information and learns hand gesture-specific features at different scales. Then, the mask loss guides the network to locate hand-specific regions during the training stage, and finally, the classification and regression branches output the category and location of a hand gesture during the model training and prediction. The experimental results on two publicly available datasets show that the proposed method outperforms most state-of-the-art HGR methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.