Abstract
This paper presents a neural rendering method for controllable portrait video synthesis. Recent advances in volumetric neural rendering, such as neural radiance fields (NeRF), have enabled the photorealistic novel view synthesis of static scenes with impressive results. However, modeling dynamic and controllable objects as part of a scene with such scene representations is still challenging. In this work, we design a system that enables 1) novel view synthesis for portrait video, of both the human subject and the scene they are in and 2) explicit control of the facial expressions through a low-dimensional expression representation. We represent the distribution of human facial expressions using the expression parameters of a 3D Morphable Model (3DMM) and condition the NeRF volumetric function on them. In order to guide the network to learn disentangled control for static scene appearance and dynamic facial actions, we impose a spatial prior via 3DMM fitting. We show the effectiveness of our method on free view synthesis of portrait videos with expression controls. To train a scene, our method only requires a short video of a subject captured by a mobile device.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have