Abstract

This paper presents a novel approach to recover true fine surface detail of deforming meshes reconstructed from multi-view video. Template-based methods for performance capture usually produce a coarse-to-medium scale detail 4D surface reconstruction which does not contain the real high-frequency geometric detail present in the original video footage. Fine scale deformation is often incorporated in a second pass by using stereo constraints, features, or shading-based refinement. In this paper, we propose an alternative solution to this second stage by formulating dense dynamic surface reconstruction as a global optimization problem of the densely deforming surface. Our main contribution is an implicit representation of a deformable mesh that uses a set of Gaussian functions on the surface to represent the initial coarse mesh, and a set of Gaussians for the images to represent the original captured multi-view images. We effectively find the fine scale deformations for all mesh vertices, which maximize photo-temporal-consistency, by densely optimizing our model-to-image consistency energy on all vertex positions. Our formulation yields a smooth closed form energy with implicit occlusion handling and analytic derivatives. Furthermore, it does not require error-prone correspondence finding or discrete sampling of surface displacement values. We demonstrate our approach on a variety of datasets of human subjects wearing loose clothing and performing different motions. We qualitatively and quantitatively demonstrate that our technique successfully reproduces finer detail than the input baseline geometry.

Highlights

  • Over the last decade, performance capture techniques have enabled the 3D reconstruction of the motion and the appearance of real-world scenes, such as human motion or facial expressions, using a multi-camera capture setup (de Aguiar et al 2008; Gall et al 2009a; Bradley et al 2008; Vlasic et al 2008)

  • Input coarse meshes are obtained using techniques based on sparse feature matching, shape-from-silhouette and multi-view 3D reconstruction, and lack of surface details

  • Extending the original work presented in Robertini et al (2014), we have presented an effective framework for performance capture of deforming meshes with fine-scale timevarying surface detail from multi-view video recordings

Read more

Summary

Introduction

Performance capture techniques have enabled the 3D reconstruction of the motion and the appearance of real-world scenes, such as human motion or facial expressions, using a multi-camera capture setup (de Aguiar et al 2008; Gall et al 2009a; Bradley et al 2008; Vlasic et al 2008). Unlike previous performance capture methods, we are able to formulate the model-to-image photo-consistency energy that guides the deformation as a closed form expression, and we can compute its analytic derivatives. This enables implicit handling of occlusions, as well as spatial and temporal coherence constraints, while preserving a smooth consistency energy function. 6) that our approach is able to reconstruct more of the correct fine-scale detail that is present in the input video sequences, than both of the baseline methods (i.e. template-based and template-free), for instance the wrinkles in a skirt We added an extensive set of results and experiments to validate the performance of our improved approach

Related Work
Overview
Implicit Model
Surface Gaussians
Image Gaussians
Projection of Surface Gaussians
Similarity Term
Regularization Term
Temporal Smoothing Term
Optimization
Datasets
Computational Complexity and Efficiency
Evaluation
Skir t Sequence
Dance Sequence
Pop2lock Sequence
H andstand and W heel Sequences
Parameters
Limitations and Future Work
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call