Abstract
In this paper, we study the problem of monocular 3D human pose estimation based on deep learning. Due to single view limitations, the monocular human pose estimation cannot avoid the inherent occlusion problem. The common methods use the multi-view based 3D pose estimation method to solve this problem. However, single-view images cannot be used directly in multi-view methods, which greatly limits practical applications. To address the above-mentioned issues, we propose a novel end-to-end 3D pose estimation network for monocular 3D human pose estimation. First, we propose a multi-view pose generator to predict multi-view 2D poses from the 2D poses in a single view. Secondly, we propose a simple but effective data augmentation method for generating multi-view 2D pose annotations, on account of the existing datasets (e.g., Human3.6M, etc.) not containing a large number of 2D pose annotations in different views. Thirdly, we employ graph convolutional network to infer a 3D pose from multi-view 2D poses. From experiments conducted on public datasets, the results have verified the effectiveness of our method. Furthermore, the ablation studies show that our method improved the performance of existing 3D pose estimation networks.
Highlights
In this paper, we study the problem of monocular 3D human pose estimation based on deep learning
(2) In the 3D pose estimation task, the more multi-views were more important than the fewer views, which fully demonstrates that the Multi-view pose generator (MvPG)-16 module effectively extracted the multi-views feature
We proposed a Multi-view Pose Generator (MvPG) for 3D pose estimation from a novel perspective
Summary
Research that studied 3D pose estimation has mainly focused on three different directions, namely 2D-to-3D pose estimation [10,13], monocular image-based 3D pose estimation [8,10,14,15], and multi-view images based 3D pose estimation [16,17,18,19] These methods were mainly evaluated on the Human3.6M dataset [20], which was collected in a highly constrained environment with limited subjects and background variations. Symmetry 2020, 12, 1116 access to more available information, and better performance, compared to using a single image These methods need multi-view datasets during training, but such datasets are more difficult to obtain. We propose a novel loss function for constraining both joint points and bone length
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.