Uncertainty-Aware Deep Multi-View Photometric Stereo

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

This paper presents a simple and effective solution to the longstanding classical multi-view photometric stereo (MVPS) problem. It is well-known that photometric stereo (PS) is excellent at recovering high-frequency surface details, whereas multi-view stereo (MVS) can help remove the low-frequency distortion due to PS and retain the global geometry of the shape. This paper proposes an approach that can effectively utilize such complementary strengths of PS and MVS. Our key idea is to combine them suitably while considering the per-pixel uncertainty of their estimates. To this end, we estimate per-pixel surface normals and depth using an uncertainty-aware deep-PS network and deep-MVS network, respectively. Uncertainty modeling helps select reliable surface normal and depth estimates at each pixel which then act as a true representative of the dense surface geometry. At each pixel, our approach either selects or discards deep-PS and deep-MVS network prediction depending on the prediction uncertainty measure. For dense, detailed, and precise inference of the object's surface profile, we propose to learn the implicit neural shape representation via a multilayer perceptron (MLP). Our approach encourages the MLP to converge to a natural zero-level set surface using the confident prediction from deep-PS and deep-MVS networks, providing superior dense surface reconstruction. Extensive experiments on the DiLiGenT-MV benchmark dataset show that our method provides high-quality shape recovery with a much lower memory footprint while outperforming almost all of the existing approaches.

Similar Papers
  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 20
  • 10.1109/wacv56688.2023.00314
Multi-View Photometric Stereo Revisited
  • Jan 1, 2023
  • Berk Kaya + 4 more

Multi-view photometric stereo (MVPS) is a preferred method for detailed and precise 3D acquisition of an object from images. Although popular methods for MVPS can provide outstanding results, they are often complex to execute and limited to isotropic material objects. To address such limitations, we present a simple, practical approach to MVPS, which works well for isotropic as well as other object material types such as anisotropic and glossy. The proposed approach in this paper exploits the benefit of uncertainty modeling in a deep neural network for a reliable fusion of photometric stereo (PS) and multi-view stereo (MVS) network predictions. Yet, contrary to the recently proposed state-of-the-art, we introduce neural volume rendering methodology for a trustworthy fusion of MVS and PS measurements. The advantage of introducing neural volume rendering is that it helps in the reliable modeling of objects with diverse material types, where existing MVS methods, PS methods, or both may fail. Furthermore, it allows us to work on neural 3D shape representation, which has recently shown outstanding results for many geometric processing tasks. Our suggested new loss function aims to fit the zero level set of the implicit neural function using the most certain MVS and PS network predictions coupled with weighted neural volume rendering cost. The proposed approach shows state-of-the-art results when tested extensively on several benchmark datasets.

  • Research Article
  • Cite Count Icon 4
  • 10.1007/s00371-017-1430-5
Multi-view photometric stereo using surface deformation
  • Aug 28, 2017
  • The Visual Computer
  • Jiangbin Gan + 4 more

This paper presents a hybrid approach for 3D reconstruction by fusing photometric stereo and multi-view stereo. The 3D surface is obtained by capturing a set of images taken from different viewpoints under time-varying illuminations. Key factors in the reconstruction process are surface normals that are obtained from photometric stereo. The surface is initialized by integrating the normals and then refined by performing iterative deformations on the initial surface and thereby optimizing image and normal consistency in multiple views. Benefiting from the employment of the deformation approach, we are able to perform image and normal consistency optimization without using matching windows. Instead, always the complete surface is back-projected. This makes the proposed approach much simpler and more robust compared to window-based approaches, which typically require global optimization with constraints on neighboring windows. Experiments on real-world data and ground-truth data show that for diffuse midsized objects without large depth discontinuities our approach improves the accuracy of the reconstructions compared to exiting approaches.

  • Conference Article
  • Cite Count Icon 57
  • 10.1109/iccv.2019.00114
A Differential Volumetric Approach to Multi-View Photometric Stereo
  • Oct 1, 2019
  • Fotios Logothetis + 2 more

Highly accurate 3D volumetric reconstruction is still an open research topic where the main difficulty is usually related to merging some rough estimations with high frequency details. One of the most promising methods is the fusion between multi-view stereo and photometric stereo images. Beside the intrinsic difficulties that multi-view stereo and photometric stereo in order to work reliably, supplementary problems arise when considered together. In this work, we present a volumetric approach to the multi-view photometric stereo problem. The key point of our method is the signed distance field parameterisation and its relation to the surface normal. This is exploited in order to obtain a linear partial differential equation which is solved in a variational framework, that combines multiple images from multiple points of view in a single system. In addition, the volumetric approach is naturally implemented on an octree, which allows for fast ray-tracing that reliably alleviates occlusions and cast shadows. Our approach is evaluated on synthetic and real data-sets and achieves state-of-the-art results.

  • Conference Article
  • Cite Count Icon 31
  • 10.1109/wacv51458.2022.00402
Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo
  • Jan 1, 2022
  • Berk Kaya + 4 more

We present a modern solution to the multi-view photometric stereo problem (MVPS). Our work suitably exploits the image formation model in a MVPS experimental setup to recover the dense 3D reconstruction of an object from images. We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry. Contrary to the previous multi-staged framework to MVPS, where the position, iso-depth contours, or orientation measurements are estimated independently and then fused later, our method is simple to implement and realize. Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network. We render the MVPS images by considering the object's surface normals for each 3D sample point along the viewing direction rather than explicitly using the density gradient in the volume space via 3D occupancy information. We optimize the proposed neural radiance field representation for the MVPS setup efficiently using a fully connected deep network to recover the 3D geometry of an object. Extensive evaluation on the DiLiGenT-MV benchmark dataset shows that our method performs better than the approaches that perform only PS or only multi-view stereo (MVS) and provides comparable results against the state-of-the-art multistage fusion methods.

  • Conference Article
  • Cite Count Icon 54
  • 10.1109/iccv.2013.148
Multiview Photometric Stereo Using Planar Mesh Parameterization
  • Dec 1, 2013
  • Jaesik Park + 4 more

We propose a method for accurate 3D shape reconstruction using uncalibrated multiview photometric stereo. A coarse mesh reconstructed using multiview stereo is first parameterized using a planar mesh parameterization technique. Subsequently, multiview photometric stereo is performed in the 2D parameter domain of the mesh, where all geometric and photometric cues from multiple images can be treated uniformly. Unlike traditional methods, there is no need for merging view-dependent surface normal maps. Our key contribution is a new photometric stereo based mesh refinement technique that can efficiently reconstruct meshes with extremely fine geometric details by directly estimating a displacement texture map in the 2D parameter domain. We demonstrate that intricate surface geometry can be reconstructed using several challenging datasets containing surfaces with specular reflections, multiple albedos and complex topologies.

  • Conference Article
  • Cite Count Icon 53
  • 10.1109/iccv.2015.103
Photogeometric Scene Flow for High-Detail Dynamic 3D Reconstruction
  • Dec 1, 2015
  • Paulo F U Gotardo + 3 more

Photometric stereo (PS) is an established technique for high-detail reconstruction of 3D geometry and appearance. To correct for surface integration errors, PS is often combined with multiview stereo (MVS). With dynamic objects, PS reconstruction also faces the problem of computing optical flow (OF) for image alignment under rapid changes in illumination. Current PS methods typically compute optical flow and MVS as independent stages, each one with its own limitations and errors introduced by early regularization. In contrast, scene flow methods estimate geometry and motion, but lack the fine detail from PS. This paper proposes photogeometric scene flow (PGSF) for high-quality dynamic 3D reconstruction. PGSF performs PS, OF, and MVS simultaneously. It is based on two key observations: (i) while image alignment improves PS, PS allows for surfaces to be relit to improve alignment, (ii) PS provides surface gradients that render the smoothness term in MVS unnecessary, leading to truly data-driven, continuous depth estimates. This synergy is demonstrated in the quality of the resulting RGB appearance, 3D geometry, and 3D motion.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/3dimpvt.2011.12
Dynamic Shape Capture via Periodical-Illumination Optical Flow Estimation and Multi-view Photometric Stereo
  • May 1, 2011
  • Ying Fu + 2 more

Multi-view photometric stereo is well established for the shape recovery of static objects. However, it is difficult to align motion images under varying illumination so as to perform photometric stereo reconstruction for dynamic objects. To tackle this issue, this paper presents an optical flow estimation approach which works under periodically varying illuminations, and in cooperation with photometric stereo, enables high-quality 3D reconstruction of dynamic objects. Firstly, multi-view images of the moving object are captured under periodically varying illumination by the multi-camera multi-light system. Then, the optical flow is estimated to synthesize images under different illuminations for each viewpoint. Finally, the multi-view photometric stereo technique is employed to get a high accurate 3D model for each time instant. Experimental results on motion actors demonstrate that temporal successive images under varying illuminations are effectively registered, permitting accurate photometric reconstruction for moving objects.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.3390/s24082400
LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D Reconstruction
  • Apr 9, 2024
  • Sensors (Basel, Switzerland)
  • Weiming Luo + 2 more

With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching algorithms. However, MVS tasks face noise challenges because of natural multiplicative noise and negative gain in algorithms, which reduce the quality and accuracy of the generated models and depth maps. Traditional MVS methods often struggle with noise, relying on assumptions that do not always hold true under real-world conditions, while deep learning-based MVS approaches tend to suffer from high noise sensitivity. To overcome these challenges, we introduce LNMVSNet, a deep learning network designed to enhance local feature attention and fuse features across different scales, aiming for low-noise, high-precision MVS 3D reconstruction. Through extensive evaluation of multiple benchmark datasets, LNMVSNet has demonstrated its superior performance, showcasing its ability to improve reconstruction accuracy and completeness, especially in the recovery of fine details and clear feature delineation. This advancement brings hope for the widespread application of MVS, ranging from precise industrial part inspection to the creation of immersive virtual environments.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/3dtv.2009.5069625
Accurate 3D reconstruction via surface-consistency
  • May 1, 2009
  • Chenglei Wu + 2 more

We present an algorithm that fuses Multi-view stereo (MVS) and photometric stereo to reconstruct 3D model of objects filmed by multiple cameras under varying illuminations. Firstly, we obtain the surface normal scaled by albedo for each view through photometric stereo techniques. Then, based on the scaled normal, a new correspondence matching method, namely surface-consistency metric, is proposed to acquire accurate 3D positions of pixels through triangulation. After filtering the point cloud, a Poisson surface reconstruction is applied to obtain a watertight mesh. The algorithm has been implemented based on our multi-camera and multi-light acquisition system. We validate the method by complete reconstruction of challenging real objects and show experimentally that this technique can greatly improve on previous MVS results.

  • Research Article
  • Cite Count Icon 1
  • 10.3169/itej.64.112
センサフュージョンによる効率的な3次元モデルの推定と表現
  • Jan 1, 2010
  • The Journal of The Institute of Image Information and Television Engineers
  • Tomoaki Higo + 2 more

We propose a novel method for 3d modeling using a fusion of a laser range sensor, a camera, and a flashlight. This combination provides dense normals and surface colors that can be mapped on a 3d model, whereas conventional sensors only output point clouds of the 3d geometry. Furthermore, the fusion enables formulations to be made simply and practically. Multi-view photometric stereo is used for estimating the fine normal distribution with a basic shape measured by the laser range sensor. Our photometric stereo can easily handle near-light formulation and specularity. Detailed surfaces can be shown by applying the normal map as bump mapping to the basic shape. Robust estimation and clustering are used for estimating reflection parameters. Results demonstrate that our method can estimate highly accurate reflection parameters and provide fine surface appearances using only a small amount of data. The effectiveness of our method is shown with an application of 3d content.

  • Conference Article
  • Cite Count Icon 97
  • 10.1109/cvpr.2013.195
Multi-view Photometric Stereo with Spatially Varying Isotropic Materials
  • Jun 1, 2013
  • Zhenglong Zhou + 2 more

We present a method to capture both 3D shape and spatially varying reflectance with a multi-view photometric stereo technique that works for general isotropic materials. Our data capture setup is simple, which consists of only a digital camera and a handheld light source. From a single viewpoint, we use a set of photometric stereo images to identify surface points with the same distance to the camera. We collect this information from multiple viewpoints and combine it with structure-from-motion to obtain a precise reconstruction of the complete 3D shape. The spatially varying isotropic bidirectional reflectance distribution function (BRDF) is captured by simultaneously inferring a set of basis BRDFs and their mixing weights at each surface point. According to our experiments, the captured shapes are accurate to 0.3 millimeters. The captured reflectance has relative root-mean-square error (RMSE) of 9%. © 2013 IEEE.

  • Book Chapter
  • Cite Count Icon 67
  • 10.1007/978-3-031-19769-7_16
PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo
  • Jan 1, 2022
  • Wenqi Yang + 4 more

Traditional multi-view photometric stereo (MVPS) methods are often composed of multiple disjoint stages, resulting in noticeable accumulated errors. In this paper, we present a neural inverse rendering method for MVPS based on implicit representation. Given multi-view images of a non-Lambertian object illuminated by multiple unknown directional lights, our method jointly estimates the geometry, materials, and lights. Our method first employs multi-light images to estimate per-view surface normal maps, which are used to regularize the normals derived from the neural radiance field. It then jointly optimizes the surface normals, spatially-varying BRDFs, and lights based on a shadow-aware differentiable rendering layer. After optimization, the reconstructed object can be used for novel-view rendering, relighting, and material editing. Experiments on both synthetic and real datasets demonstrate that our method achieves far more accurate shape reconstruction than existing MVPS and neural rendering methods. Our code and model can be found at https://ywq.github.io/psnerf.KeywordsMulti-view photometric stereoInverse renderingNeural rendering

  • Research Article
  • Cite Count Icon 92
  • 10.1109/tip.2020.2968818
Multi-View Photometric Stereo: A Robust Solution and Benchmark Dataset for Spatially Varying Isotropic Materials.
  • Jan 1, 2020
  • IEEE Transactions on Image Processing
  • Min Li + 5 more

We present a method to capture both 3D shape and spatially varying reflectance with a multi-view photometric stereo (MVPS) technique that works for general isotropic materials. Our algorithm is suitable for perspective cameras and nearby point light sources. Our data capture setup is simple, which consists of only a digital camera, some LED lights, and an optional automatic turntable. From a single viewpoint, we use a set of photometric stereo images to identify surface points with the same distance to the camera. We collect this information from multiple viewpoints and combine it with structure-from-motion to obtain a precise reconstruction of the complete 3D shape. The spatially varying isotropic bidirectional reflectance distribution function (BRDF) is captured by simultaneously inferring a set of basis BRDFs and their mixing weights at each surface point. In experiments, we demonstrate our algorithm with two different setups: a studio setup for highest precision and a desktop setup for best usability. According to our experiments, under the studio setting, the captured shapes are accurate to 0.5 millimeters and the captured reflectance has a relative root-mean-square error (RMSE) of 9%. We also quantitatively evaluate state-of-the-art MVPS on a newly collected benchmark dataset, which is publicly available for inspiring future research.

  • Research Article
  • Cite Count Icon 29
  • 10.1109/tits.2022.3193421
Normal Assisted Pixel-Visibility Learning With Cost Aggregation for Multiview Stereo
  • Dec 1, 2022
  • IEEE Transactions on Intelligent Transportation Systems
  • Wei Tong + 6 more

Multiple-View Stereo (MVS) aims to reconstruct the dense 3D representations of scenes. MVS has potential applications in the fields of autonomous driving (unstructured environment construction) and robotic navigation (visual-inertial navigation). To mitigate the error of depth estimation in low-textured or occluded regions, this work proposes a two-stage multi-view stereo network for fast and accurate depth estimation. The improvements of this work over the state of the art are as follows: 1) Sparse costs are constructed to jointly predict the initial depth map and surface normal by cost regularization, which proves that the surface normals can be estimated in this way with low memory consumption. 2) A new edge refinement block is developed to refine the coarse surface normal to obtain a fine-grained surface normal map. 3) Instead of using the general variance-based metric to equally aggregate cost, a new content-adaptive cost aggregation mechanism based on the similarity of the neighboring surface normal is designed for reliable cost aggregation. To the best of our knowledge, the proposed work is the first trainable network that leverages surface normal as guidance to capture neighboring pixel-visibility, which is an effective supplement to existing depth/normal estimation frameworks. Experimental results indicate that our method can not only achieve accurate depth estimation for scene perception but also make no concession to the real-time performance and limited memory bottleblock. Multiple-view stereo (MVS) aims to reconstruct the dense 3D representations of scenes. It is widely used in the fields of industrial measurement, autonomous driving, and robotic navigation. To mitigate the error of depth estimation in challenging scenarios, this work proposes a two-stage multi-view stereo network for fast and accurate depth estimation. Our method is the first trainable network that leverages surface normal as pixel-visibility guidance to aggregate reliable cost, which could achieve accurate depth estimation and provide the perception ability for the robot. The proposed method has great potential in the fields of 3D reconstruction, industrial measurement, and robotic navigation to estimate real-time and accurate depth with limited memory consumption.

  • Conference Article
  • Cite Count Icon 33
  • 10.1109/cvpr.2016.591
Just Look at the Image: Viewpoint-Specific Surface Normal Prediction for Improved Multi-View Reconstruction
  • Jun 1, 2016
  • Silvano Galliani + 1 more

We present a multi-view reconstruction method that combines conventional multi-view stereo (MVS) with appearance-based normal prediction, to obtain dense and accurate 3D surface models. Reliable surface normals reconstructed from multi-view correspondence serve as training data for a convolutional neural network (CNN), which predicts continuous normal vectors from raw image patches. By training from known points in the same image, the prediction is specifically tailored to the materials and lighting conditions of the particular scene, as well as to the precise camera viewpoint. It is therefore a lot easier to learn than generic single-view normal estimation. The estimated normal maps, together with the known depth values from MVS, are integrated to dense depth maps, which in turn are fused into a 3D model. Experiments on the DTU dataset show that our method delivers 3D reconstructions with the same accuracy as MVS, but with significantly higher completeness.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant