Abstract

Hand gestures are considered as an effective means of communication in the field of Human-computer interaction. However, the design of an efficient hand gesture recognition (HGR) system is still a challenging task owing to a plethora of complexities such as cluttered background, illumination changes, and occlusion in a real-world environment. The paper proposes a lightweight CNN based network named CrossFeat: Multi-scale Cross Feature Aggregation network for explicit hand gesture recognition (HGR). CrossFeat employs multi-scale convolutional layers and preserves the spatial features from the hand gesture region. The use of multi-scale filters: 1 × 1, 3 × 3, 5 × 5 and 7 × 7 allow the network to learn granular and coarse edges from the different regions of the hand gestures. These complementary features enhance the learning ability of the network. Moreover, the cross-layer connectivity enables the gradient information to reach the top layers and prevent it from diminishing in the upstream layers. The proposed network is investigated on three benchmark datasets: ASL Finger Spelling, NUS-I and NUS-II. The experimental results and analysis show that the aggregation of multi-scale and cross features enhances the performance of the HGR system compared to the existing networks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.