Abstract
We propose a new method for single-camera real-world 3-D human pose estimation. Our method uses multitask training together with iterative pose refinement using a novel conditional attention mechanism. For iterative pose refinement, the output of each convolutional layer is conditioned on the latest pose estimate, using a conditioned squeeze-and-excitation network architecture that incorporates novel feedback connections. Multitask training on both an in-the-wild 2-D pose dataset and a controlled 3-D pose dataset allows for real-world 3-D pose estimation without the need for a large-scale in-the-wild 3-D pose dataset, which is unavailable. Experiments are performed on several real-world datasets, as well as the Human 3.6 Million and HumanEva-I datasets, to show that the combined attention mechanism, iterative refinement scheme, and multitask training allow us to achieve robust and competitive performance with only a simple network architecture. In addition, we show that our method is efficient enough to run on commodity hardware, producing pose estimates in real time.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have