Learning to Predict 3D Surfaces of Sculptures from Single and Multiple Views

Olivia Wiles,Andrew Zisserman

doi:10.1007/s11263-018-1124-0

Olivia Wiles, Andrew Zisserman

Open Access

https://doi.org/10.1007/s11263-018-1124-0

Copy DOI

Journal: International Journal of Computer Vision	Publication Date: Oct 22, 2018
Citations: 10	License type: open-access

Affiliation: University of Oxford

Abstract

The objective of this work is to reconstruct the 3D surfaces of sculptures from one or more images using a view-dependent representation. To this end, we train a network, SiDeNet, to predict the Silhouette and Depth of the surface given a variable number of images; the silhouette is predicted at a different viewpoint from the inputs (e.g. from the side), while the depth is predicted at the viewpoint of the input images. This has three benefits. First, the network learns a representation of shape beyond that of a single viewpoint, as the silhouette forces it to respect the visual hull, and the depth image forces it to predict concavities (which don’t appear on the visual hull). Second, as the network learns about 3D using the proxy tasks of predicting depth and silhouette images, it is not limited by the resolution of the 3D representation. Finally, using a view-dependent representation (e.g. additionally encoding the viewpoint with the input image) improves the network’s generalisability to unseen objects. Additionally, the network is able to handle the input views in a flexible manner. First, it can ingest a different number of views during training and testing, and it is shown that the reconstruction performance improves as additional views are added at test-time. Second, the additional views do not need to be photometrically consistent. The network is trained and evaluated on two synthetic datasets—a realistic sculpture dataset (SketchFab), and ShapeNet. The design of the network is validated by comparing to state of the art methods for a set of tasks. It is shown that (i) passing the input viewpoint (i.e. using a view-dependent representation) improves the network’s generalisability at test time. (ii) Predicting depth/silhouette images allows for higher quality predictions in 2D, as the network is not limited by the chosen latent 3D representation. (iii) On both datasets the method of combining views in a global manner performs better than a local method. Finally, we show that the trained network generalizes to real images, and probe how the network has encoded the latent 3D shape.

Highlights

Learning to infer the 3D shape of complex objects given only a few images is one of the grand challenges of computer vision
The evaluation measure used is the intersection over union (IoU) error for the silhouette, L1 error for the depth error, and chamfer distance for the error when evaluating in 3D
Silhouettes + depth Depth Depth Silhouettes 3D. This comparison is only done on ShapeNet as for the Sculpture dataset we found it was necessary to subtract off the mean depth to predict high quality depth maps (Sect. 4.2)

Summary

Introduction

Learning to infer the 3D shape of complex objects given only a few images is one of the grand challenges of computer vision. Learning classic work of Blanz and Vetter (1999) for faces and later for other classes such as semantic categories (Kar et al 2015; Cashman and Fitzgibbon 2013) or cuboidal room structures (Fouhey 2015; Hedau et al 2009). This work extends this area in two directions: first, it considers 3D shape inference from multiple images rather than a single one (though this is considered as well); second, it considers the quite generic class of piecewise smooth textured sculptures and the associated challenges. The views need not be International Journal of Computer Vision (2019) 127:1780–1800

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning to Predict 3D Surfaces of Sculptures from Single and Multiple Views

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computer Vision

Lead the way for us

Similar Papers

Image-based visual hull of a tennis racket
N Elliott ... T Allen
Procedia Engineering | VOL. 34
N Elliott, et. al.N Elliott ... T Allen
01 Jan 2012
Procedia Engineering | VOL. 34

Visual Hull-Based Geometric Data Compression of a 3-D Object
Sung Soo Hwang ... Wook-Joong Kim
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 25
Sung Soo Hwang, et. al.Sung Soo Hwang ... Wook-Joong Kim
01 Jul 2015
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 25

Visual Hull Tree: A New Progressive Method to Represent Voxel Data
Tae Young Jang ... Seong Dae Kim
IEEE Access | VOL. 8
Tae Young Jang, et. al.Tae Young Jang ... Seong Dae Kim
01 Jan 2020
IEEE Access | VOL. 8

Visual hull alignment and refinement across time: a 3D reconstruction algorithm combining shape-from-silhouette with stereo
G.K.M Cheung ... S Baker
-
G.K.M Cheung, et. al.G.K.M Cheung ... S Baker
18 Jun 2003
18 Jun 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning to Predict 3D Surfaces of Sculptures from Single and Multiple Views

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computer Vision