3-D Human Pose Estimation Using Iterative Conditional Squeeze and Excitation Networks.

Niall Mclaughlin,Paul Miller,Jesus Martinez-Del-Rincon

doi:10.1109/tcyb.2020.2964992

Abstract

We propose a new method for single-camera real-world 3-D human pose estimation. Our method uses multitask training together with iterative pose refinement using a novel conditional attention mechanism. For iterative pose refinement, the output of each convolutional layer is conditioned on the latest pose estimate, using a conditioned squeeze-and-excitation network architecture that incorporates novel feedback connections. Multitask training on both an in-the-wild 2-D pose dataset and a controlled 3-D pose dataset allows for real-world 3-D pose estimation without the need for a large-scale in-the-wild 3-D pose dataset, which is unavailable. Experiments are performed on several real-world datasets, as well as the Human 3.6 Million and HumanEva-I datasets, to show that the combined attention mechanism, iterative refinement scheme, and multitask training allow us to achieve robust and competitive performance with only a simple network architecture. In addition, we show that our method is efficient enough to run on commodity hardware, producing pose estimates in real time.

Full Text