Abstract
This paper presents an effective deep attention network for joint hand gesture localization and recognition using static RGB-D images. Our method trains a CNN framework based on a soft attention mechanism in an end-to-end manner, which is capable of automatically localizing hands and classifying gestures using a single network rather than relying on the conventional means of stage-wise hand segmentation/detection and classification. More precisely, our attention network first computes the weight for each proposal generated from the entire image, in order to judge the probability of the hand appearing in a given region. It then implements a global-sum operation for all proposals, which is influenced by their corresponding weights, in order to obtain a representation of the entire image. We demonstrate the feasibility and effectiveness of our method through extensive experiments on the NTU Hand Digits (NTU-HD) benchmark and the challenging HUST American Sign Language (HUST-ASL) dataset. Moreover, the proposed attention network is simple to train, without requiring bounding-box or segmentation mask annotations, which makes it easy to apply in hand gesture recognition systems. Based on the proposed attention network and taken RGB-D images as input, we obtain the state-of-the-art hand gesture recognition performance on the challenging HUST-ASL dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.