Abstract

Conventional multi-view stereo (MVS) approaches based on photo-consistency measures are generally robust, yet often fail in calculating valid depth pixel estimates in low textured areas of the scene. In this study, a novel approach is proposed to tackle this challenge by leveraging semantic priors into a PatchMatch-based MVS in order to increase confidence and support depth and normal map estimation. Semantic class labels on image pixels are used to impose class-specific geometric constraints during multiview stereo, optimising the depth estimation on weakly supported, textureless areas, commonly present in urban scenarios of building facades, indoor scenes, or aerial datasets. Detecting dominant shapes, e.g., planes, with RANSAC, an adjusted cost function is introduced that combines and weighs both photometric and semantic scores propagating, thus, more accurate depth estimates. Being adaptive, it fills in apparent information gaps and smoothing local roughness in problematic regions while at the same time preserves important details. Experiments on benchmark and custom datasets demonstrate the effectiveness of the presented approach.

Highlights

  • Multi-View Stereo (MVS) algorithms address the problem of generating a complete and dense 3D representation of the scene, given the camera calibration parameters and poses in the 3D space commonly obtained by Structure from Motion (SfM) pipelines

  • We present a new framework in which semantic information is used to support MVS and improve 3D point cloud accuracy

  • Our method cannot be directly applied to benchmark 3D reconstruction datasets such as ETH3D [3] or Tanks and Temples [2] as other MVS algorithms do [22,23,66], due to the fact that these MVS datasets lack accompanied labelled data

Read more

Summary

Introduction

Multi-View Stereo (MVS) algorithms address the problem of generating a complete and dense 3D representation of the scene, given the camera calibration parameters and poses in the 3D space commonly obtained by Structure from Motion (SfM) pipelines Such procedures have become a common practice for numerous applications that span from industrial and monitoring scenarios to cultural heritage, city mapping and localization, or autonomous navigation. Seitz et al [1] proposed a taxonomy based on which, a MVS pipeline may refer to feature point growing-based methods, voxel-based methods, surface evolution-based methods and depth map merging-based methods The latter techniques, where depth maps are fused together into a point cloud or a volumetric representation of the scene, are widely used under large scale or high precision applications due to their efficiency and scalability [2,3]. As with other depth estimation methods, PatchMatch relies on photo-consistency

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call