Abstract

Robust hand gesture recognition is a critical ingredient of many human-computer interaction applications. Recently, Deep Neural Networks (DNNs) models have been widely applied to solve different computer vision problems. However, few researchers have applied DNNs to recognize static hand gestures. In this paper, an approach is proposed that leverages the recent progress on Convolutional Neural Networks (CNNs) and uses deep-learning-based strategy to recognize 24 hand gestures from the American Sign Language (ASL). This approach is built on recently released small CNNs architectures to leverage their high accuracy rate, maintaining small model size and the feasibility to deploy on resource constrained-devices. Here, the architectures of SqueezeNet and MobileNets for both RGB and depth images are utilized, and a late-fusion strategy to concatenate features from both RGB and depth modalities is performed. Also, an effective encoding of depth images as rendered RGB images is proposed to handle depth data with CNNs, and thus improve recognition performance. Experiments are executed on a challenging publicly available American Sign Language – Finger Spelling (ASL-FS) dataset acquired by an RGB-D camera to evaluate the proposed architecture. Results show that the proposed approach achieves higher average accuracy, outperforming all previous methods, and has great feasibility to apply the models on resource-constrained devices and embedded visual applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call