Abstract

In this paper, we present an efficient and reliable deep-learning approach that allows users to communicate with robots via hand gesture recognition. Contrary to other works which use external devices such as gloves [1] or joysticks [2] to tele-operate robots, the proposed approach uses only visual information to recognize user's instructions that are encoded in a set of pre-defined hand gestures. Particularly, the method consists of two modules which work sequentially to extract 2D landmarks of hands –ie. joints positions– and to predict the hand gesture based on a temporal representation of them. The approach has been validated in a recent state-of-the-art dataset where it outperformed other methods that use multiple pre-processing steps such as optical flow and semantic segmentation. Our method achieves an accuracy of 87.5% and runs at 10 frames per second. Finally, we conducted real-life experiments with our IVO robot to validate the framework during the interaction process.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call