Weighted Patch-Based Reconstruction: Linking (Multi-view) Stereo to Scale Space
Abstract Surface reconstruction using patch-based multi-view stereo commonly assumes that the underlying surface is locally planar. This is typically not true so that least-squares fitting of a planar patch leads to systematic errors which are of particular importance for multi-scale surface reconstruction. In a recent paper [12], we determined the modulation transfer function of a classical patch-based stereo system. Our key insight was that the reconstructed surface is a box-filtered version of the original surface. Since the box filter is not a true low-pass filter this causes high-frequency artifacts. In this paper, we propose an extended reconstruction model by weighting the least-squares fit of the 3D patch. We show that if the weighting function meets specified criteria the reconstructed surface is the convolution of the original surface with that weighting function. A choice of particular interest is the Gaussian which is commonly used in image and signal processing but left unexploited by many multi-view stereo algorithms. Finally, we demonstrate the effects of our theoretic findings using experiments on synthetic and real-world data sets.Keywordsmulti-view stereomulti-scale surface reconstruction
- Conference Article
14
- 10.1109/icra40945.2020.9197089
- May 1, 2020
Multi-view stereo (MVS) algorithms have been commonly used to model large-scale structures. When processing MVS, image acquisition is an important issue because its reconstruction quality depends heavily on the acquired images. Recently, an explore-then-exploit strategy has been used to acquire images for MVS. This method first constructs a coarse model by exploring an entire scene using a pre-allocated camera trajectory. Then, it rescans the unreconstructed regions from the coarse model. However, this strategy is inefficient because of the frequent overlap of the initial and rescanning trajectories. Furthermore, given the complete coverage of images, MVS algorithms do not guarantee an accurate reconstruction result.In this study, we propose a novel view path-planning method based on an online MVS system. This method aims to incrementally construct the target three-dimensional (3D) model in real time. View paths are continually planned based on online feedbacks from the partially constructed model. The obtained paths fully cover low-quality surfaces while maximizing the reconstruction performance of MVS. Experimental results demonstrate that the proposed method can construct high quality 3D models with one exploration trial, without any rescanning trial as in the explore-then-exploit method.
- Dissertation
- 10.14711/thesis-991012786067603412
- Jan 1, 2019
Multi-view stereo (MVS) reconstructs 3D representations of the scene from imagery, which is a core problem of computer vision extensively studied for decades. Traditionally, MVS algorithms apply hand-crafted similarity metrics and engineered regularizations to compute dense correspondences. While these methods have shown great results under ideal Lambertian scenarios, classical MVS algorithms still suffer from numerous artifacts. In this thesis, we propose to advance the MVS reconstruction using recent deep learning techniques. First, we present an end-to-end deep learning architecture, MVSNet, for depth map inference from multi-view images. The key contribution of this part is the careful integration between multi-view geometries and convolutional neural networks (CNNs). In the network, we extract deep image features and build the 3D cost volume upon the camera frustum via the differentiable homography warping. Then, 3D convolutions are applied to regularize and regress the output depth map. We demonstrate on DTU dataset that MVSNet significantly outperforms previous state-of-the-arts in both reconstruction completeness and overall quality. Next, we propose to extend the MVSNet architecture for large-scale MVS reconstruction. One major limitation of current learning-based approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes. To this end, we sequentially regularize 2D cost maps via the gated recurrent unit (GRU) rather than regularize the entire 3D cost volume in one go. The GRU regularization dramatically reduces memory consumption and makes high-resolution reconstructions feasible. The proposed R-MVSNet is evaluated on the large-scale Tanks and Temples dataset and achieves comparable results to classical large-scale MVS algorithms. Finally, we establish a large-scale synthetic MVS dataset, BlendedMVS, based on blended images and rendered depth maps. While several MVS datasets have been proposed, they fail to provide accurate depth and occlusion information as ground truth mesh models are usually incomplete. We therefore establish a new MVS dataset based on model rendering. Textured meshes are first reconstructed from images of different scenes, which are then rendered into color images, depth maps and occlusion maps. We further blend rendered images with input images using high-pass and low-pass filters to generate our training input. Extensive experiments demonstrate that models trained on BlendedMVS achieve significant better generalization ability compared with models trained on other MVS datasets. In sum, this thesis presents a complete learning-based solution to large-scale multi-view stereopsis, including a current baseline network (MVSNet), its large-scale extension (R-MVSNet) and a large-scale synthetic dataset (BlendedMVS). We bridge the gap between classical MVS reconstructions and recent deep learning techniques and demonstrate the effectiveness of the learning-based MVS through extensive experiments on different datasets.
- Research Article
46
- 10.1016/j.aei.2023.102196
- Sep 28, 2023
- Advanced Engineering Informatics
Improving completeness and accuracy of 3D point clouds by using deep learning for applications of digital twins to civil structures
- Research Article
1
- 10.1007/s11063-018-9816-6
- Mar 14, 2018
- Neural Processing Letters
Multi-view stereo (MVS) map based 3D range reconstruction is to generate 3D ranges by analyzing the surrounding snapshots from different perspectives. Different to the traditional method which employing the expensive and difficult maintaining laser range devices to calibrate the range of the real 3D objects, MVS has achieved its success by seeking the geometrical correlations between the correspondences from the snapshot of different perspectives. The concerning of MVS keeps rising thanks to the fast development of digital maps and 3D printing. Several algorithms with regard to MVS has been well developed and achieved their success with regard to reconstruction of 3D ranges. Meanwhile, most of the algorithms were mainly focusing on the fusion and merging of different scenes and surface refinement. Less capability of the feature matching algorithms on the affine invariant images renders the current MVS algorithms need huge amount of images with tiny perspective differences. In this paper, we will propose a new MVS algorithm, deploying our previous published Affine Invariant Feature Descriptor (AIFD) to detect and match the correspondences from different perspectives and applying Homograph matrix and segmentation to define the planes of the objects. Thanks to the AIFD and Homograph based projection model, our proposed MVS algorithm outperform other MVS algorithms in terms of speed and efficiency.
- Research Article
58
- 10.3390/buildings9030070
- Mar 20, 2019
- Buildings
This research presents a novel method for automated construction progress monitoring. Using the proposed method, an accurate and complete 3D point cloud is generated for automatic outdoor and indoor progress monitoring throughout the project duration. In this method, Structured-from-Motion (SFM) and Multi-View-Stereo (MVS) algorithms coupled with photogrammetric principles for the coded targets’ detection are exploited to generate as-built 3D point clouds. The coded targets are utilized to automatically resolve the scale and increase the accuracy of the point cloud generated using SFM and MVS methods. Having generated the point cloud, the CAD model is generated from the as-built point cloud and compared with the as-planned model. Finally, the quantity of the performed work is determined in two real case study projects. The proposed method is compared to the Structured-from-Motion (SFM)/Clustering Multi-Views Stereo (CMVS)/Patch-based Multi-View Stereo (PMVS) algorithm, as a common method for generating 3D point cloud models. The proposed photogrammetric Multi-View Stereo method reveals an accuracy of around 99 percent and the generated noises are less compared to the SFM/CMVS/PMVS algorithm. It is observed that the proposed method has extensively improved the accuracy of generated points cloud compared to the SFM/CMVS/PMVS algorithm. It is believed that the proposed method may present a novel and robust tool for automated progress monitoring in construction projects.
- Conference Article
42
- 10.1109/cvpr.2008.4587688
- Jun 1, 2008
The Middlebury multi-view stereo evaluation clearly shows that the quality and speed of most multi-view stereo algorithms depends significantly on the number and selection of input images. In general, not all input images contribute equally to the quality of the output model, since several images may often contain similar and hence overly redundant visual information. This leads to unnecessarily increased processing times. On the other hand, a certain degree of redundancy can help to improve the reconstruction in more ldquodifficultrdquo regions of a model. In this paper we propose an image selection scheme for multi-view stereo which results in improved reconstruction quality compared to uniformly distributed views. Our method is tuned towards the typical requirements of current multi-view stereo algorithms, and is based on the idea of incrementally selecting images so that the overall coverage of a simultaneously generated proxy is guaranteed without adding too much redundant information. Critical regions such as cavities are detected by an estimate of the local photo-consistency and are improved by adding additional views. Our method is highly efficient, since most computations can be out-sourced to the GPU. We evaluate our method with four different methods participating in the Middlebury benchmark and show that in each case reconstructions based on our selected images yield an improved output quality while at the same time reducing the processing time considerably.
- Research Article
4
- 10.1109/tim.2023.3250231
- Jan 1, 2023
- IEEE Transactions on Instrumentation and Measurement
Multiview stereo (MVS) aims to measure the precise surface depth of a scene from observations at multiple photography angles and then densely reconstruct its 3-D geometry information. Learning-based MVS approaches have been dominantly popular for their robustness to low texture areas and non-Lambertian surfaces. However, most existing methods focus on estimating depth maps for input images by constructing global cost volumes and designing ingenious yet large variance-based 3-D-CNNs for cost volume regularization. Such approaches ignore the co-visible relationship embedded in multiple views, resulting in heavy computation, erroneous cost aggregation from invisible views, and finally inaccurate 3-D reconstruction results. In this article, we propose a co-visibility reasoning MVS network (CR-MVSNet) to explore the co-visible relationships hidden in multiple views for reliable multiview similarity measurement and efficient reconstruction. Precisely, the proposed co-visibility reasoning cost aggregation (CRCA) module includes the adaptive intercost volume aggregation via mining the uncertainty of co-visibility relationships in multiple views and the adaptive intracost volume aggregation by exploiting spatial contextual information. Moreover, the cost volumes are constructed via the proposed global-to-patch manner to speed up computation. Experimental results show that our approach achieves the best overall performance on the DTU, Tanks and Temples, and ETH3D-test datasets over recent state-of-the-art MVS algorithms. The consistently favorable results on three datasets with completely different depth ranges proved the superiority and generalizability of CR-MVSNet.
- Research Article
12
- 10.1080/00405000.2020.1862479
- Dec 28, 2020
- The Journal of The Textile Institute
Fabric pilling evaluation is very important for the quality control of textile industry. Traditional image analysis based methods have the disadvantages of 2 D imaging and color sensitivity, this paper presents a new method based on stereo vision to solve the problem of 3 D imaging of fabric pilling. One set of self-developed mobile camera system is established to capture a group of images for the 3 D reconstruction of the fabric surface, the point cloud model of the fabric surface is generated by the self-developed stereo vision algorithm, including structure from motion (SFM) and patch-based multi-view stereo (PMVS) algorithm. One 2 D gray-scale image is obtained from the 3 D point cloud model by mapping to the 2 D image plane, which contains the depth information of fabric pilling. The segmentation of fabric pilling could be done by accurate positioning of edge detection, adaptive thresholding and morphological analysis. Four feature parameters including pilling number, pilling area, pilling density and coverage ratio are extracted for the determination of fabric pilling grade objectively. Experimental results show that the new developed system and method is effective and reliable for the fabric pilling evaluation, which is consistent with the subjective pilling evaluation. It is workable for the color printed or yarn dyed fabrics, the proposed imaging system could be a good solution for the digital intelligent quality control of textile products.
- Research Article
4
- 10.1007/s00371-017-1430-5
- Aug 28, 2017
- The Visual Computer
This paper presents a hybrid approach for 3D reconstruction by fusing photometric stereo and multi-view stereo. The 3D surface is obtained by capturing a set of images taken from different viewpoints under time-varying illuminations. Key factors in the reconstruction process are surface normals that are obtained from photometric stereo. The surface is initialized by integrating the normals and then refined by performing iterative deformations on the initial surface and thereby optimizing image and normal consistency in multiple views. Benefiting from the employment of the deformation approach, we are able to perform image and normal consistency optimization without using matching windows. Instead, always the complete surface is back-projected. This makes the proposed approach much simpler and more robust compared to window-based approaches, which typically require global optimization with constraints on neighboring windows. Experiments on real-world data and ground-truth data show that for diffuse midsized objects without large depth discontinuities our approach improves the accuracy of the reconstructions compared to exiting approaches.
- Conference Article
5
- 10.1145/3384382.3384530
- May 5, 2020
We present a user-guided system for accessible 3D reconstruction and modeling of real-world objects using multi-view stereo. The system is an interactive tool where the user models the object on top of multiple selected photographs. Our tool helps the user place quads correctly aligned to the photographs using a multi-view stereo algorithm. This algorithm in combination with user-provided information about topology, visibility, and how to separate foreground from background, creates favorable conditions in successfully reconstructing the object. The user only needs to manually specify a coarse topology which, followed by subdivision and a global optimization algorithm, creates an accurate model with the desired mesh density. This global optimization algorithm has a higher probability of converging to an accurate result than a fully automatic system. With our proposed tool, we lower the barrier of entry for creating high-quality 3D reconstructions of real-world objects with a desirable topology. Our interactive tool separates the most tedious and difficult parts of modeling to the computer, while giving the user control over the most common robustness issues in automatic 3D reconstruction. The provided workflow can be a preferable alternative to using automatic scanning techniques followed by re-topologization.
- Book Chapter
1
- 10.4018/978-1-4666-3994-2.ch009
- Jan 1, 2013
3D modeling of complex objects is an important task of computer graphics and poses substantial difficulties to traditional synthetic modeling approaches. The multi-view stereo reconstruction technique, which tries to automatically acquire object models from multiple photographs, provides an attractive alternative. The whole reconstruction process of the multi-view stereo technique is introduced in this chapter, from camera calibration and image acquisition to various reconstruction algorithms. The shape from silhouette technique is also introduced since it provides a close shape approximation for many multi-view stereo algorithms. Various multi-view algorithms have been proposed, which can be mainly classified into four classes: 3D volumetric, surface evolution, feature extraction and expansion, and depth map based approaches. This chapter explains the underlying theory and pipeline of each class in detail and analyzes their major properties. Two published benchmarks that are used to qualitatively evaluate multi-view stereo algorithms are presented, along with the benchmark criteria and evaluation results.
- Research Article
555
- 10.1561/0600000052
- Jun 24, 2015
- Foundations and Trends in Computer Graphics and Vision
This tutorial presents a hands-on view of the field of multi-view stereo with a focus on practical algorithms. Multi-view stereo algorithms are able to construct highly detailed 3D models from images alone. They take a possibly very large set of images and construct a 3D plausible geometry that explains the images under some reasonable assumptions, the most important being scene rigidity. The tutorial frames the multi-view stereo problem as an image/geometry consistency optimization problem. It describes in detail its main two ingredients: robust implementations of photometric consistency measures, and efficient optimization algorithms. It then presents how these main ingredients are used by some of the most successful algorithms, applied into real applications, and deployed as products in the industry. Finally it describes more advanced approaches exploiting domain-specific knowledge such as structural priors, and gives an overview of the remaining challenges and future research directions.
- Research Article
11
- 10.1111/jmi.13040
- Jul 2, 2021
- Journal of Microscopy
Three-dimensional (3D) transfer functions build the basis for a comprehensive characterization of optical imaging systems in the spatial frequency domain. Utilizing the projection-slice theorem, the 2D modulation transfer function of an incoherent imaging system can be derived from a 3D transfer function by integration with respect to the axial spatial frequency. For a diffraction limited microscope with homogeneous incoherent pupil illumination, the modulation transfer function equals the 2D autocorrelation function of a circular disc. However, until now to the best of our knowledge no 3D transfer function has been published, which exactly leads to the 2D modulation transfer function of a diffraction limited microscope in reflection mode. In this article, we derive a formula, which after integration with respect to the axial spatial frequency coordinate perfectly fits to the diffraction limited 2D modulation transfer function. The inverse three-dimensional Fourier transform of the 3D transfer function results in a complex-valued 3D point spread function, from which the depth of field, the lateral resolution and, in addition, the corresponding 3D point spread function of both, a conventional and an interference microscope, can beobtained.
- Research Article
32
- 10.1088/1757-899x/1073/1/012066
- Feb 1, 2021
- IOP Conference Series: Materials Science and Engineering
The development of the Information and Computer Technology (ICT) sector, three-dimensional (3D) technology is also growing rapidly. Currently, the need to visualize 3D objects is widely used in animation and graphic applications, architecture, education, cultural recognition and Virtual Reality. 3D modeling of historic buildings has become a concern in recent years. 3D reconstruction is an attempt to document reconstruction or restoration if the building is destroyed. By using the 3D model reconstruction using Structure from Motion (SFM) and Multi View Stereo (MVS) algorithm based on Computer Vision, it is hoped that the results of this 3D modeling can be utilized as an effort to preserve 3D objects in the Penataran Temple cultural heritage area. This research was conducted by taking as many as 61 images of objects in the Blitar Penataran Temple area. The photos obtained were reconstructed into a 3D model using the Structure From Motion algorithm in the meshroom. This research a trial of the original image with a compressed image for reconstruction is used to compare the 3D reconstruction process from the two input data. From 61 images processed using the Structure Form Motion algorithm, 33 poses of camera pose and 3D points were improved, both original and compressed images. The number of iterations compresses 1.4% less than the original image and takes 43.53% faster than the original image.
- Research Article
37
- 10.3390/rs8050381
- May 4, 2016
- Remote Sensing
This paper presents a novel multi-view dense point cloud generation algorithm based on low-altitude remote sensing images. The proposed method was designed to be especially effective in enhancing the density of point clouds generated by Multi-View Stereo (MVS) algorithms. To overcome the limitations of MVS and dense matching algorithms, an expanded patch was set up for each point in the point cloud. Then, a patch-based Multiphoto Geometrically Constrained Matching (MPGC) was employed to optimize points on the patch based on least square adjustment, the space geometry relationship, and epipolar line constraint. The major advantages of this approach are twofold: (1) compared with the MVS method, the proposed algorithm can achieve denser three-dimensional (3D) point cloud data; and (2) compared with the epipolar-based dense matching method, the proposed method utilizes redundant measurements to weaken the influence of occlusion and noise on matching results. Comparison studies and experimental results have validated the accuracy of the proposed algorithm in low-altitude remote sensing image dense point cloud generation.