Texture-Generic Deep Shape-From-Template

David Fuentes-Jimenez,David Casillas-Perez,Toby Collins,Adrien Bartoli,Daniel Pizarro

doi:10.1109/access.2021.3082011

David Fuentes-Jimenez, David Casillas-Perez + Show 3 more

Open Access

https://doi.org/10.1109/access.2021.3082011

Copy DOI

Abstract

Shape-from-Template (SfT) solves the registration and 3D reconstruction of a deformable 3D object, represented by the template, from a single image. Recently, methods based on deep learning have been able to solve SfT for the wide-baseline case in real-time, clearly surpassing classical methods. However, the main limitation of current methods is the need for fine tuning of the neural models to a specific geometry and appearance represented by the template texture map. We propose the first texture-generic deep learning SfT method which adapts to new texture maps at run-time, without the need for texture specific fine tuning. We achieve this by dividing the problem into a segmentation step and a registration and reconstruction step, both solved with deep learning. We include the template texture map as one of the neural inputs in both steps, training our models to adapt to different ones. We show that our method obtains comparable or better results to previous deep learning models, which are texture specific. It works in challenging imaging conditions, including complex deformations, occlusions, motion blur and poor textures. Our implementation runs in real-time, with a low-cost GPU and CPU.

Highlights

Image registration and image-based 3D reconstruction are fundamental problems extensively studied in Computer Vision
This is used to combat the general ill-posedness of 3D reconstruction from a single image, and it restricts the space of possible solutions to ones that are physically viable
We have proposed a new semantic segmentation architecture that allows us to add the template texture map as one of the inputs, which clearly differs from the classical category-level semantic segmentation methods, where the semantic categories are learned and fixed during training

Summary

INTRODUCTION

Image registration and image-based 3D reconstruction are fundamental problems extensively studied in Computer Vision. We propose the first DNN-based SfT method that takes the template texture map as a run-time input This is used to condition the registration and reconstruction DNNs on the specific texture of the object of interest. In addition to our base models, we propose a lightweight architecture for the registration-reconstruction network that can be used to run our method in real-time in low-cost GPUs, CPUs and embedded systems It is based on a new custom decoding layer that implements the inverse Block Discrete Cosine Transform (DCT), which allows us to greatly reduce the number of network parameters, while controlling the loss of information induced by the new decoding layers.

PREVIOUS WORK

DNN ARCHITECTURE

TESTING ON UNSEEN TEXTURE MAPS

CONCLUSION