Abstract

3D human position and shape estimate are crucial in many computer vision applications. Despite the fact that there are numerous deep learning techniques designed to handle this problem, they frequently only use training networks with RGB images from a single point of view. In this paper, a unique approach to solve this issue is proposed by combining a regression-based multi-view picture learning loop with an optimization-based multi-view model. This is because some public datasets are collected by multi-view camera systems. A parameterized human body model's position and shape parameters are initially deduced by a convolutional neural network (CNN) from multi-view photos. This work then introduces an enhanced multi-view optimization method called MV-SMPLify, which aligns the SMPL model with multi-view images by using the regressed pose and shape as beginning values. Following that, the CNN model's training can be monitored using the optimum parameters. The Self-avatar project as a whole is a self-supervised framework that combines the advantages of both the CNN method and the optimization-based strategy. Additionally, the use of multi-view photos improves thorough supervision during training. This methodology outperforms earlier methods in a variety of ways, according to qualitative and quantitative testing using open datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call