Abstract

Building good 3D maps is a challenging and expensive task, which requires high-quality sensors and careful, time-consuming scanning. We seek to reduce the cost of building good reconstructions by correcting views of existing low-quality ones in a post-hoc fashion using learnt priors over surfaces and appearance. We train a convolutional neural network model to predict the difference in inverse-depth from varying viewpoints of two meshes — one of low-quality that we wish to correct, and one of high-quality that we use as a reference. Our full model runs at 11.3[Formula: see text]Hz when aggregating four input views. In contrast to previous work, we pay attention to the problem of excessive smoothing in corrected meshes. We address this with a suitable network architecture, and introduce a loss-weighting mechanism that emphasizes edges in the prediction. Furthermore, smooth predictions result in geometrical inconsistencies. To deal with this issue, we present a loss function which penalizes re-projection differences that are not due to occlusions. Future applications of this work will incorporate semantic scene understanding in a multi-task learning setting. We explore the efficacy of the proposed system in terms of gross error correction and generalization capability by showing its performance in practice on a subset of the Kitti Odometry dataset, complete with a component-wise ablation study. We evaluate correctness and completeness measures of surface reconstruction across viewpoints and show that the proposed system is introspective in regions lacking sufficient high-quality supervision — indeed, models trained with geometric consistency loss create a lot more surface in areas that were not supervised, in one case filling in 67.97% or 8010[Formula: see text]m2 of an unlabeled input region. Finally, we assess the practical applicability of our method at large-scale by experiments over the full scope of the Kitti Odometry dataset. Broadly, as a measure of effectiveness, our model reduces gross errors by 45.3–77.5%, up to five times more than previous work. We also assess the practical applicability of our method to 3D reconstruction at large scales and find that compared to the baseline our model shows better stability in correctness when improving completeness of surfaces, and is effective in reducing median total error by up to 21.8[Formula: see text]cm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call