Abstract

We propose a method that use a convolutional neural network (CNN) to estimate human pose by analyzing the projection of the depth and ridge data, which represent local maxima in a distance transform map. To fully utilize the 3D information of depth points, we propose a method to project the depth and ridge data on various directions. The proposed projection method can reduce the 3D information loss, the ridge data can avoid joint drift, and the CNN increases localization accuracy. The proposed method proceeds as follows. (1) We use depth data to segment the human from the background and extract ridge data from human silhouettes. (2) We project the depth and ridge data onto XY, XZ, and ZY planes. (3) ResNet-101 accepts six projected images and use 1 × 1 convolution layers to generate 2D heatmaps and offsets. (4) We generate 2D keypoints per plane by using the soft-argmax operation. (5) We obtain 3D joint positions by using the fully-connected layers. In experiments on the SMMC-10, EVAL, and ITOP datasets, the proposed method achieved the state-of-the-art pose estimation accuracies. The proposed method can eliminate the 3D information loss and drift of joint positions that can occur during estimation of human pose.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.