Abstract

Adversarial learning plays an important role in recovering 3D human pose and shape from monocular videos. However, the effectiveness of this process is not often considered. Hence we aim to improve the performance of adversarial learning in 3D human pose and shape estimation. The performance of adversarial learning is mainly influenced by two parts: generator and discriminator. For the generator, we utilize temporal information on a deeper level by adding an attention-based temporal encoder in generator to model the time series of features, which contributes to a more appropriate data representation for pose and shape regression. For the discriminator, we innovatively make use of human skeleton topology information when extracting features from the estimation results. To realize this, we base the discriminator’s design on the graph convolution network. In addition, to eliminate the jitter in the estimation results, we design a rotation disentangled smoothing module to process the estimated rotation parameters. We did adequate experiments on public in-the-wild datasets 3DPW and MPI-INF-3DHP. On both datasets, our method achieves higher accuracy and lower acceleration error compared with previous methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call