Abstract

The work presented in the paper is dedicated to determining and evaluating the most efficient neural network architecture applied as a multiple regression network localizing human body joints in 3D space based on a single low resolution depth image. The main challenge was to deal with a noisy and coarse representation of the human body, as observed by a depth sensor from a large distance, and to achieve high localization precision. The regression network was expected to reason about relations of body parts based on depth image, and to extract locations of joints, and provide coordinates defining the body pose. The method involved creation of a dataset with 200,000 realistic depth images of a 3D body model, then training and testing numerous architectures including feedforward multilayer perceptron network and deep convolutional neural networks. The results of training and evaluation are included and discussed. The most accurate DNN network was further trained and evaluated on an augmented depth images dataset. The achieved accuracy was similar to a reference Kinect algorithm results, with a great benefit of fast processing speed and significantly lower requirements on sensor resolution, as it used 100 times less pixels than Kinect depth sensor. The method was robust against sensor noise, allowing imprecision of depth measurements. Finally, our results were compared with VGG, MobileNet, and ResNet architectures.

Highlights

  • Many modern computer interfaces employ gesture tracking and user pose estimation

  • The approach presented here involves a very low resolution depth images and employs neural network-based regression run in real time on a medium performance CPU

  • Kinect sensor was chosen as a reference for the results presented in this paper because there were no other as accurate and as well-established methods for extracting joints positions from depth image of human body

Read more

Summary

Introduction

Many modern computer interfaces employ gesture tracking and user pose estimation. The foremost example is a Kinect sensor [1, 23] utilizing a depth camera and a dedicated algorithm based on random forests. The approach presented here involves a very low resolution depth images and employs neural network-based regression run in real time on a medium performance CPU. The goal was to propose and evaluate a DNN (Deep Convolutional Neural Network) multiple regression method for estimating 3D coordinates of body joints, able to operate on a very low resolution depth image (100 less pixels than Kinect), and more efficient than the prior method. Aims of the presented work were: to verify how selected neural network architectures perform with joint localization task, to speed up the localization process, and to reduce the required size of input depth map comparing to the reference Kinect algorithm [1, 23].

Related work
Method
Depth images dataset
Training of neural network architectures
Neural networks performance and accuracy evaluation
Localization accuracy improvements
Comparison with state-of-the-art architectures
F SFS FS F SFSFS
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call