Abstract

In this paper, we study the problem of monocular 3D human pose estimation based on deep learning. Due to single view limitations, the monocular human pose estimation cannot avoid the inherent occlusion problem. The common methods use the multi-view based 3D pose estimation method to solve this problem. However, single-view images cannot be used directly in multi-view methods, which greatly limits practical applications. To address the above-mentioned issues, we propose a novel end-to-end 3D pose estimation network for monocular 3D human pose estimation. First, we propose a multi-view pose generator to predict multi-view 2D poses from the 2D poses in a single view. Secondly, we propose a simple but effective data augmentation method for generating multi-view 2D pose annotations, on account of the existing datasets (e.g., Human3.6M, etc.) not containing a large number of 2D pose annotations in different views. Thirdly, we employ graph convolutional network to infer a 3D pose from multi-view 2D poses. From experiments conducted on public datasets, the results have verified the effectiveness of our method. Furthermore, the ablation studies show that our method improved the performance of existing 3D pose estimation networks.

Highlights

  • In this paper, we study the problem of monocular 3D human pose estimation based on deep learning

  • (2) In the 3D pose estimation task, the more multi-views were more important than the fewer views, which fully demonstrates that the Multi-view pose generator (MvPG)-16 module effectively extracted the multi-views feature

  • We proposed a Multi-view Pose Generator (MvPG) for 3D pose estimation from a novel perspective

Read more

Summary

Introduction

Research that studied 3D pose estimation has mainly focused on three different directions, namely 2D-to-3D pose estimation [10,13], monocular image-based 3D pose estimation [8,10,14,15], and multi-view images based 3D pose estimation [16,17,18,19] These methods were mainly evaluated on the Human3.6M dataset [20], which was collected in a highly constrained environment with limited subjects and background variations. Symmetry 2020, 12, 1116 access to more available information, and better performance, compared to using a single image These methods need multi-view datasets during training, but such datasets are more difficult to obtain. We propose a novel loss function for constraining both joint points and bone length

Related Work
Multi-View 3D Pose Estimation
Single-View 3D Pose Estimation
GCNs for 3D Pose Estimation
Multi-View Pose Generator
Network Design
Loss Function
Experiments
Setting
Ablation Study
Performance Analysis of the Number of Views Generated by MvPG
Impact of MvPG on 3D Pose Estimation Network
Comparison with the State of the Art
Qualitative Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call