Abstract

Abstract. In this paper, we propose an approach to detect and recognize 3D one-handed gestures for human-machine interaction. The logical structure of the modules of the system for recording a gestural database is described. The logical structure of the database of 3D gestures is presented. Examples of frames showing gestures in the format of Full High Definition, in the map depth mode and in the infrared illustrated. Models of a deep convolutional network for detecting faces and hand shapes are described. The results of automatic detection of the area with the face and the shape of the hand are given. Identified the distinctive features of the gesture at a certain point in time. The process of recognizing 3D one-handed gestures is described. Due to its versatility, this method can be used in tasks of biometrics, computer vision, machine learning, automatic systems of face recognition, sign languages.

Highlights

  • In the modern information society, the task of increasing the level of automation and robotization of all human activities is one of the most important (Ryumin and Karpov, 2017)

  • This paper presents an approach to automatic detection and recognition of both: static and dynamic 3D one-handed gestures in real time using an optical camera and a depth sensor (Kinect v2, 2019)

  • Hand shape detector based on SSD structure with MobileNetV2 network model (Sandler et al, 2018). This detector was trained on a multimedia database of 3D gestures of the Russian sign language collected and labeled by the authors

Read more

Summary

INTRODUCTION

In the modern information society, the task of increasing the level of automation and robotization of all human activities is one of the most important (Ryumin and Karpov, 2017). - Video recordings of the gesture (color format, optical resolution 1920x1080 pixels (FullHD), for a depth map and an infrared mode — 512x424 pixels, frame rate — 30 frames per second); - Data on the coordinates describing the position of the skeleton on the video; - Images selected frame by frame from the video required for labeling.

DESCRIPTION OF THE METHOD
CONCLUSIONS AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call