Abstract

We propose an efficient and novel architecture for 3D articulated human pose retrieval and reconstruction from 2D landmarks extracted from a 2D synthetic image, an annotated 2D image, an in-the-wild real RGB image or even a hand-drawn sketch. Given 2D joint positions in a single image, we devise a data-driven framework to infer the corresponding 3D human pose. To this end, we first normalize 3D human poses from Motion Capture (MoCap) dataset by eliminating translation, orientation, and the skeleton size discrepancies from the poses and then build a knowledge-base by projecting a subset of joints of the normalized 3D poses onto 2D image-planes by fully exploiting a variety of virtual cameras. With this approach, we not only transform 3D pose space to the normalized 2D pose space but also resolve the 2D-3D cross-domain retrieval task efficiently. The proposed architecture searches for poses from a MoCap dataset that are near to a given 2D query pose in a definite feature space made up of specific joint sets. These retrieved poses are then used to construct a weak perspective camera and a final 3D posture under the camera model that minimizes the reconstruction error. To estimate unknown camera parameters, we introduce a nonlinear, two-fold method. We exploit the retrieved similar poses and the viewing directions at which the MoCap dataset was sampled to minimize the projection error. Finally, we evaluate our approach thoroughly on a large number of heterogeneous 2D examples generated synthetically, 2D images with ground-truth, a variety of real in-the-wild internet images, and a proof of concept using 2D hand-drawn sketches of human poses. We conduct a pool of experiments to perform a quantitative study on PARSE dataset. We also show that the proposed system yields competitive, convincing results in comparison to other state-of-the-art methods.

Highlights

  • We efficiently deal with the 2D-3D cross model Knn search and retrieval. We benefit from these K nearest neighbors in several ways: (i) We first predict the unknown camera parameters utilizing these Knn combining with the information of the view directions at which Motion Capture (MoCap) data is sampled. (ii) We learn a local pose model using these retrieved Knn in a Principal

  • The query input is from the CMU dataset, SDS 1, and the MoCap dataset is HDM05, MDS hdm

  • This paper proposes a novel and efficient architecture for 3D human pose search and retrieval that leads to 3D human pose estimation from a single static 2D image that is either a synthetic image, an annotated 2D image, an in-the-wild real image, or a hand-drawn sketch

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. We make the process of searching and retrieval more convenient and proficient Once found, these Knn are utilized to estimate the unknown camera parameters and predict the final 3D articulated human pose. As input to the proposed system, we use 2D landmarks that are extracted from the (a) heterogeneous 2D synthetic examples created from MoCap data by utilizing some random camera parameters, (b) detected or annotated 2D pose in RGB images (c) in-the-wild real images, or (d) hand-drawn sketches of human postures. We efficiently deal with the 2D-3D cross model Knn search and retrieval We benefit from these K nearest neighbors in several ways: (i) We first predict the unknown camera parameters utilizing these Knn combining with the information of the view directions at which MoCap data is sampled.

Related Work
Methodology
Pose Skeleton Description
Normalization
Translational Normalization
Orientational Normalization
Skeleton Size Normalization
Search and Retrieval
Camera Parameters
Pose Reconstruction
Retrieved Pose Error
Projection Control Error
Experiments
Datasets
Mocap Datasets
Input Datasets
Principal Components
Nearest Neighbors
Joint Weights
Energy Weights
Virtual Cameras
Evaluation on MDS cmu
Evaluation on MDS hdm
Evaluation on Noisy Input Data
Real Images of Parse Dataset
Hand-Drawn Sketches
Camera Viewpoints
Joints’ Sensitivity
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call