Learning-Based Non-rigid Video Depth Estimation Using Invariants to Generalized Bas-Relief Transformations

Matteo Pedone,Abdelrahman Mostafa,Janne Heikkilä

doi:10.1007/s10851-022-01105-y

Matteo Pedone, Abdelrahman Mostafa + Show 1 more

https://doi.org/10.1007/s10851-022-01105-y

Copy DOI

Abstract

We present a method to locally reconstruct dense video depth maps of a non-rigidly deformable object directly from a video sequence acquired by a static orthographic camera. The estimation of depth is performed locally on spatiotemporal patches of the video, and then, the full depth video is recovered by combining them together. Since the geometric complexity of a local spatiotemporal patch of a deforming non-rigid object is often simple enough to be faithfully represented with a parametric model, we artificially generate a database of small deforming rectangular meshes rendered with different material properties and light conditions, along with their corresponding depth videos, and use such data to train a convolutional neural network. Since the database images are rendered with an orthographic camera model, linear deformations along the optical axis cannot be recovered from the training images. These are known in the literature as generalized bas-relief (GBR) transformations. We address this ambiguity problem by employing the invariant-theoretic normalization procedure in order to obtain complete invariants with respect to this group of transformations, and use them in the loss function of a neural network. We tested our method on both synthetic and Kinect data and experimentally observed that the reconstruction error is significantly lower than the one obtained using conventional non-rigid structure from motion approaches and state-of-the-art video depth estimation techniques.

Full Text