Abstract

The most effective and accurate deep convolutional neural network (faster region-based convolutional neural network (Faster R-CNN) Inception V2 model, single shot detector (SSD) Inception V2 model) based architectures for real-time hand gesture recognition is proposed. The proposed models are tested on standard data sets (NUS hand posture data set-II, Senz-3D) and custom-developed (MITI hand data set (MITI-HD)) data set. The performance metrics are analysed for intersection over union (IoU) ranges between 0.5 and 0.95. IoU value of 0.5 resulted in higher precision compared to other IoU values considered (0.5:0.95, 0.75). It is observed that the Faster R-CNN Inception V2 model resulted in higher precision (0.990 for APall, IoU = 0.5) compared to SSD Inception V2 model (0.984 for all) for MITI-HD 160. The computation time of Faster R-CNN Inception V2 is higher compared to SSD Inception V2 model and also resulted in less number of mispredictions. Increasing the size of samples (MITI-HD 300) resulted in improvement of APall = 0.991. Improvement in large (APlarge) and medium (APmedium) size detections are not significant when compared to small (APsmall) detections. It is concluded that the Faster R-CNN Inception V2 model is highly suitable for real-time hand gesture recognition system under unconstrained environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call