Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

We present a modern solution to the multi-view photometric stereo problem (MVPS). Our work suitably exploits the image formation model in a MVPS experimental setup to recover the dense 3D reconstruction of an object from images. We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry. Contrary to the previous multi-staged framework to MVPS, where the position, iso-depth contours, or orientation measurements are estimated independently and then fused later, our method is simple to implement and realize. Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network. We render the MVPS images by considering the object's surface normals for each 3D sample point along the viewing direction rather than explicitly using the density gradient in the volume space via 3D occupancy information. We optimize the proposed neural radiance field representation for the MVPS setup efficiently using a fully connected deep network to recover the 3D geometry of an object. Extensive evaluation on the DiLiGenT-MV benchmark dataset shows that our method performs better than the approaches that perform only PS or only multi-view stereo (MVS) and provides comparable results against the state-of-the-art multistage fusion methods.

Similar Papers
  • Conference Article
  • Cite Count Icon 31
  • 10.1109/cvpr52688.2022.01227
Uncertainty-Aware Deep Multi-View Photometric Stereo
  • Jun 1, 2022
  • Berk Kaya + 4 more

This paper presents a simple and effective solution to the longstanding classical multi-view photometric stereo (MVPS) problem. It is well-known that photometric stereo (PS) is excellent at recovering high-frequency surface details, whereas multi-view stereo (MVS) can help remove the low-frequency distortion due to PS and retain the global geometry of the shape. This paper proposes an approach that can effectively utilize such complementary strengths of PS and MVS. Our key idea is to combine them suitably while considering the per-pixel uncertainty of their estimates. To this end, we estimate per-pixel surface normals and depth using an uncertainty-aware deep-PS network and deep-MVS network, respectively. Uncertainty modeling helps select reliable surface normal and depth estimates at each pixel which then act as a true representative of the dense surface geometry. At each pixel, our approach either selects or discards deep-PS and deep-MVS network prediction depending on the prediction uncertainty measure. For dense, detailed, and precise inference of the object's surface profile, we propose to learn the implicit neural shape representation via a multilayer perceptron (MLP). Our approach encourages the MLP to converge to a natural zero-level set surface using the confident prediction from deep-PS and deep-MVS networks, providing superior dense surface reconstruction. Extensive experiments on the DiLiGenT-MV benchmark dataset show that our method provides high-quality shape recovery with a much lower memory footprint while outperforming almost all of the existing approaches.

  • Conference Article
  • Cite Count Icon 57
  • 10.1109/iccv.2019.00114
A Differential Volumetric Approach to Multi-View Photometric Stereo
  • Oct 1, 2019
  • Fotios Logothetis + 2 more

Highly accurate 3D volumetric reconstruction is still an open research topic where the main difficulty is usually related to merging some rough estimations with high frequency details. One of the most promising methods is the fusion between multi-view stereo and photometric stereo images. Beside the intrinsic difficulties that multi-view stereo and photometric stereo in order to work reliably, supplementary problems arise when considered together. In this work, we present a volumetric approach to the multi-view photometric stereo problem. The key point of our method is the signed distance field parameterisation and its relation to the surface normal. This is exploited in order to obtain a linear partial differential equation which is solved in a variational framework, that combines multiple images from multiple points of view in a single system. In addition, the volumetric approach is naturally implemented on an octree, which allows for fast ray-tracing that reliably alleviates occlusions and cast shadows. Our approach is evaluated on synthetic and real data-sets and achieves state-of-the-art results.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 20
  • 10.1109/wacv56688.2023.00314
Multi-View Photometric Stereo Revisited
  • Jan 1, 2023
  • Berk Kaya + 4 more

Multi-view photometric stereo (MVPS) is a preferred method for detailed and precise 3D acquisition of an object from images. Although popular methods for MVPS can provide outstanding results, they are often complex to execute and limited to isotropic material objects. To address such limitations, we present a simple, practical approach to MVPS, which works well for isotropic as well as other object material types such as anisotropic and glossy. The proposed approach in this paper exploits the benefit of uncertainty modeling in a deep neural network for a reliable fusion of photometric stereo (PS) and multi-view stereo (MVS) network predictions. Yet, contrary to the recently proposed state-of-the-art, we introduce neural volume rendering methodology for a trustworthy fusion of MVS and PS measurements. The advantage of introducing neural volume rendering is that it helps in the reliable modeling of objects with diverse material types, where existing MVS methods, PS methods, or both may fail. Furthermore, it allows us to work on neural 3D shape representation, which has recently shown outstanding results for many geometric processing tasks. Our suggested new loss function aims to fit the zero level set of the implicit neural function using the most certain MVS and PS network predictions coupled with weighted neural volume rendering cost. The proposed approach shows state-of-the-art results when tested extensively on several benchmark datasets.

  • Research Article
  • Cite Count Icon 4
  • 10.1007/s00371-017-1430-5
Multi-view photometric stereo using surface deformation
  • Aug 28, 2017
  • The Visual Computer
  • Jiangbin Gan + 4 more

This paper presents a hybrid approach for 3D reconstruction by fusing photometric stereo and multi-view stereo. The 3D surface is obtained by capturing a set of images taken from different viewpoints under time-varying illuminations. Key factors in the reconstruction process are surface normals that are obtained from photometric stereo. The surface is initialized by integrating the normals and then refined by performing iterative deformations on the initial surface and thereby optimizing image and normal consistency in multiple views. Benefiting from the employment of the deformation approach, we are able to perform image and normal consistency optimization without using matching windows. Instead, always the complete surface is back-projected. This makes the proposed approach much simpler and more robust compared to window-based approaches, which typically require global optimization with constraints on neighboring windows. Experiments on real-world data and ground-truth data show that for diffuse midsized objects without large depth discontinuities our approach improves the accuracy of the reconstructions compared to exiting approaches.

  • Conference Article
  • Cite Count Icon 53
  • 10.1109/iccv.2015.103
Photogeometric Scene Flow for High-Detail Dynamic 3D Reconstruction
  • Dec 1, 2015
  • Paulo F U Gotardo + 3 more

Photometric stereo (PS) is an established technique for high-detail reconstruction of 3D geometry and appearance. To correct for surface integration errors, PS is often combined with multiview stereo (MVS). With dynamic objects, PS reconstruction also faces the problem of computing optical flow (OF) for image alignment under rapid changes in illumination. Current PS methods typically compute optical flow and MVS as independent stages, each one with its own limitations and errors introduced by early regularization. In contrast, scene flow methods estimate geometry and motion, but lack the fine detail from PS. This paper proposes photogeometric scene flow (PGSF) for high-quality dynamic 3D reconstruction. PGSF performs PS, OF, and MVS simultaneously. It is based on two key observations: (i) while image alignment improves PS, PS allows for surfaces to be relit to improve alignment, (ii) PS provides surface gradients that render the smoothness term in MVS unnecessary, leading to truly data-driven, continuous depth estimates. This synergy is demonstrated in the quality of the resulting RGB appearance, 3D geometry, and 3D motion.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.3390/s24082400
LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D Reconstruction
  • Apr 9, 2024
  • Sensors (Basel, Switzerland)
  • Weiming Luo + 2 more

With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching algorithms. However, MVS tasks face noise challenges because of natural multiplicative noise and negative gain in algorithms, which reduce the quality and accuracy of the generated models and depth maps. Traditional MVS methods often struggle with noise, relying on assumptions that do not always hold true under real-world conditions, while deep learning-based MVS approaches tend to suffer from high noise sensitivity. To overcome these challenges, we introduce LNMVSNet, a deep learning network designed to enhance local feature attention and fuse features across different scales, aiming for low-noise, high-precision MVS 3D reconstruction. Through extensive evaluation of multiple benchmark datasets, LNMVSNet has demonstrated its superior performance, showcasing its ability to improve reconstruction accuracy and completeness, especially in the recovery of fine details and clear feature delineation. This advancement brings hope for the widespread application of MVS, ranging from precise industrial part inspection to the creation of immersive virtual environments.

  • Research Article
  • Cite Count Icon 1
  • 10.1109/access.2024.3357134
Photometric Stereo Super Resolution via Complex Surface Structure Estimation
  • Jan 1, 2024
  • IEEE Access
  • Han-Nyoung Lee + 1 more

Photometric stereo, which derives per-pixel surface normals from shading cues, faces challenges in capturing high-resolution (HR) images in linear response systems. We address the representation of HR surface normals from low-resolution (LR) photometric stereo images. To represent fine details of the surface normal in the HR domain, we propose a novel plug-in high-frequency representation module named the Complex Surface Structure (CSS) estimator. When combined with a conventional photometric stereo model, CSS is capable of representing intricate surface structures in 2D Fourier space. We show that photometric stereo super-resolution (SR) with our CSS estimator provides high-fidelity surface normal representations in higher resolution from the LR inputs. Experiments demonstrate that our results are quantitatively and qualitatively better than those of the existing deep learning-based SR work.

  • Conference Article
  • Cite Count Icon 54
  • 10.1109/iccv.2013.148
Multiview Photometric Stereo Using Planar Mesh Parameterization
  • Dec 1, 2013
  • Jaesik Park + 4 more

We propose a method for accurate 3D shape reconstruction using uncalibrated multiview photometric stereo. A coarse mesh reconstructed using multiview stereo is first parameterized using a planar mesh parameterization technique. Subsequently, multiview photometric stereo is performed in the 2D parameter domain of the mesh, where all geometric and photometric cues from multiple images can be treated uniformly. Unlike traditional methods, there is no need for merging view-dependent surface normal maps. Our key contribution is a new photometric stereo based mesh refinement technique that can efficiently reconstruct meshes with extremely fine geometric details by directly estimating a displacement texture map in the 2D parameter domain. We demonstrate that intricate surface geometry can be reconstructed using several challenging datasets containing surfaces with specular reflections, multiple albedos and complex topologies.

  • Conference Article
  • Cite Count Icon 180
  • 10.1109/iccv.2003.1238405
Shape and motion under varying illumination: unifying structure from motion, photometric stereo, and multiview stereo
  • Jan 1, 2003
  • Li Zhang + 3 more

We present an algorithm for computing optical flow, shape, motion, lighting, and albedo from an image sequence of a rigidly-moving Lambertian object under distant illumination. The problem is formulated in a manner that subsumes structure from motion, multiview stereo, and photometric stereo as special cases. The algorithm utilizes both spatial and temporal intensity variation as cues: the former constrains flow and the latter constrains surface orientation; combining both cues enables dense reconstruction of both textured and textureless surfaces. The algorithm works by iteratively estimating affine camera parameters, illumination, shape, and albedo in an alternating fashion. Results are demonstrated on videos of hand-held objects moving in front of a fixed light and camera.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3390/s20216261
Deep Photometric Stereo Network with Multi-Scale Feature Aggregation
  • Nov 3, 2020
  • Sensors (Basel, Switzerland)
  • Chanki Yu + 1 more

We present photometric stereo algorithms robust to non-Lambertian reflection, which are based on a convolutional neural network in which surface normals of objects with complex geometry and surface reflectance are estimated from a given set of an arbitrary number of images. These images are taken from the same viewpoint under different directional illumination conditions. The proposed method focuses on surface normal estimation, where multi-scale feature aggregation is proposed to obtain a more accurate surface normal, and max pooling is adopted to obtain an intermediate order-agnostic representation in the photometric stereo scenario. The proposed multi-scale feature aggregation scheme using feature concatenation is easily incorporated into existing photometric stereo network architectures. Our experiments were performed with a DiLiGent photometric stereo benchmark dataset consisting of ten real objects, and they demonstrated that the accuracies of our calibrated and uncalibrated photometric stereo approaches were improved over those of baseline methods. In particular, our experiments also demonstrated that our uncalibrated photometric stereo outperformed the state-of-the-art method. Our work is the first to consider the multi-scale feature aggregation in photometric stereo, and we showed that our proposed multi-scale fusion scheme estimated the surface normal accurately and was beneficial to improving performance.

  • Research Article
  • Cite Count Icon 8
  • 10.1109/21.400507
A neural network approach to photometric stereo inversion of real-world reflectance maps for extracting 3-D shapes of objects
  • Jan 1, 1995
  • IEEE Transactions on Systems, Man, and Cybernetics
  • K.V Rajaram + 2 more

Presents a neural network approach to the problem of photometric stereo inversion of the reflectance maps of real-world objects for the purpose of estimating the 3-D attitudes of the surface patches of objects. As in the photometric stereo approach, here also the observation that there is a one-to-one mapping between the n-tuples of the photometric stereo image intensities and the orientations of the surface normals is valid. A multilayered feedforward neural network with backpropagation training algorithm is used as dimensionality reducer to effectively encode this mapping by associating the two components of surface normals to the observed intensities from three photometric stereo images of the underlying surface patches. The training patterns are sampled from the images of a Gaussian sphere of average reflectance containing both diffuse and specular components. The neural network thus trained has been tested on images of real-world objects with different shapes and reflectance properties. Using the surface normals estimated by the neural network, 3-D shapes of the objects have been reconstructed to a good approximation.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>

  • Abstract
  • Cite Count Icon 1
  • 10.1136/gutjnl-2014-307263.96
PTU-022 A Novel Photometric Stereo Imaging Sensor For Endoscopy Imaging: Proof Of Concept Studies On A Porcine Model
  • Jun 1, 2014
  • Gut
  • A Poullis + 4 more

IntroductionThe American Society of Gastroenterology Endoscopy led Preservation and Incorporation of Valuable Endoscopic Innovations initiative has identified real time polyp diagnosis as one of the next major technology-driven changes in...

  • Conference Article
  • Cite Count Icon 11
  • 10.1109/wacv56688.2023.00305
NLMVS-Net: Deep Non-Lambertian Multi-View Stereo
  • Jan 1, 2023
  • Kohei Yamashita + 3 more

We introduce a novel multi-view stereo (MVS) method that can simultaneously recover not just per-pixel depth but also surface normals, together with the reflectance of textureless, complex non-Lambertian surfaces captured under known but natural illumination. Our key idea is to formulate MVS as an end-to-end learnable network, which we refer to as nLMVS-Net, that seamlessly integrates radiometric cues to leverage surface normals as view-independent surface features for learned cost volume construction and filtering. It first estimates surface normals as pixel-wise probability densities for each view with a novel shape-from-shading network. These per-pixel surface normal densities and the input multi-view images are then input to a novel cost volume filtering network that learns to recover per-pixel depth and surface normal. The reflectance is also explicitly estimated by alternating with geometry reconstruction. Extensive quantitative evaluations on newly established synthetic and real-world datasets show that nLMVS-Net can robustly and accurately recover the shape and reflectance of complex objects in natural settings.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.cviu.2022.103384
Single-camera 3D head fitting for mixed reality clinical applications
  • Feb 9, 2022
  • Computer Vision and Image Understanding
  • Tejas Mane + 4 more

Single-camera 3D head fitting for mixed reality clinical applications

  • Book Chapter
  • Cite Count Icon 5
  • 10.3233/faia240440
MVSBoost: An Efficient Point Cloud-Based 3D Reconstruction
  • Sep 25, 2024
  • Frontiers in artificial intelligence and applications
  • Umair Haroon + 3 more

Efficient and accurate 3D reconstruction is crucial for various applications, including augmented and virtual reality, medical imaging, and cinematic special effects. While traditional Multi-View Stereo (MVS) systems have been fundamental in these applications, using neural implicit fields in implicit 3D scene modeling has introduced new possibilities for handling complex topologies and continuous surfaces. However, neural implicit fields often suffer from computational inefficiencies, overfitting, and heavy reliance on data quality, limiting their practical use. This paper presents an enhanced MVS framework that integrates multi-view 360-degree imagery with robust camera pose estimation via Structure from Motion (SfM) and advanced image processing for point cloud densification, mesh reconstruction, and texturing. Our approach significantly improves upon traditional MVS methods, offering superior accuracy and precision as validated using Chamfer distance metrics on the Realistic Synthetic 360 dataset. The developed MVS technique enhances the detail and clarity of 3D reconstructions and demonstrates superior computational efficiency and robustness in complex scene reconstruction, effectively handling occlusions and varying viewpoints. These improvements suggest that our MVS framework can compete with and potentially exceed current state-of-the-art neural implicit field methods, especially in scenarios requiring real- time processing and scalability.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant