Abstract

Gesture recognition is an intensively researched area for several reasons. One of the most important reasons is because of this technology’s numerous application in various domains (e.g., robotics, games, medicine, automotive, etc.) Additionally, the introduction of three-dimensional (3D) image acquisition techniques (e.g., stereovision, projected-light, time-of-flight, etc.) overcomes the limitations of traditional two-dimensional (2D) approaches. Combined with the larger availability of 3D sensors (e.g., Microsoft Kinect, Intel RealSense, photonic mixer device (PMD), CamCube, etc.), recent interest in this domain has sparked. Moreover, in many computer vision tasks, the traditional statistic top approaches were outperformed by deep neural network-based solutions. In view of these considerations, we proposed a deep neural network solution by employing PointNet architecture for the problem of hand gesture recognition using depth data produced by a time of flight (ToF) sensor. We created a custom hand gesture dataset, then proposed a multistage hand segmentation by designing filtering, clustering, and finding the hand in the volume of interest and hand-forearm segmentation. For comparison purpose, two equivalent datasets were tested: a 3D point cloud dataset and a 2D image dataset, both obtained from the same stream. Besides the advantages of the 3D technology, the accuracy of the 3D method using PointNet is proven to outperform the 2D method in all circumstances, even the 2D method that employs a deep neural network.

Highlights

  • Gesture recognition has numerous applications: human–computer interaction (HCI), human–robot interaction (HRI), video surveillance, security, sports, and more

  • Sensors based on the time of flight (ToF) principle emerged as a promising technology with clear advantages over two-dimensional (2D) approaches

  • Regarding the principles used for 3.0 threedimensional (3D) hand gesture recognition, one could divide them into two broad classes: (1) engineered features, extracted from 3D data and (2) implicit features, extracted automatically using a deep neural network

Read more

Summary

Introduction

Gesture recognition has numerous applications: human–computer interaction (HCI), human–robot interaction (HRI), video surveillance, security, sports, and more. Sensors based on the time of flight (ToF) principle emerged as a promising technology with clear advantages over two-dimensional (2D) approaches. These sensors are: (1) non-intrusive, since only depth data could be collected; (2) include an ambient illumination invariance used in low light or complete darkness; and (3) feature a simple segmentation process [1]. One approach uses skin color to detect and segment the hand [2] and obtain binary silhouettes They are further normalized using gesture geometry and Krawtchouk moment features, which are argued to be robust viewpoint changes, in order to classify the gesture. Histograms of optical flow with dynamic time warping (DTW) simultaneously performed

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call