Abstract

We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least approximately, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. On benchmarks, we demonstrate superior accuracy compared to another method that uses supervision at the level of 2D image correspondences.

Highlights

  • T HE ability to understand and reconstruct the content of images in 3D is of great importance in many computer vision applications

  • We introduce a new learning algorithm that takes as input a collection of single-view images of a deformable object category and produces as output a deep network that can estimate the 3D shape of any object instance given a single image of it (Fig. 1)

  • The 3D shapes are recovered with high fidelity

Read more

Summary

Introduction

T HE ability to understand and reconstruct the content of images in 3D is of great importance in many computer vision applications. When it comes to learning categories of visual objects, for instance to detect and segment them, most approaches model them as 2D patterns [1], with no obvious understanding of their 3D structure. The first condition is that no 2D or 3D ground truth information (such as keypoints, segmentation, depth maps, or prior knowledge of a 3D model) is available. The second condition is that learning can only use an unconstrained collection of single-view images — in particular, it does not use multiple views of the same object instance. Learning from single-view images is useful because in many applications we only have a source of independent still images to work with (for example obtained form an Internet search engine)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.