Abstract

Vision-based hand gestures recognition is often employed to remotely control machines and robots. In this work, we perform a comparison among multiple convolutional neural networks (CNNs) models, namely VGG16, InceptionNet, EfficientNet, and a self-designed CNN model for recognition of multiple hand gestures captured using a video camera. The video dataset has been self-collected for 24 dynamic hand gestures performed by 5 users. Key-frames are extracted from the input videos and processed in the considered CNN models for classification. The models are optimized for classification by varying the learning rates and the type of optimizer used during training. While VGG16, InceptionNet and the proposed CNN model attain the highest classification accuracy of 99.3055; the proposed CNN model consists of the least number of trainable paramters. Hence, the proposed CNN model is a light-weight model, which is suitable for deployment on edge device while it yields performance comparable with the well-known CNN models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call