Active 3D Modeling via Online Multi-View Stereo
Multi-view stereo (MVS) algorithms have been commonly used to model large-scale structures. When processing MVS, image acquisition is an important issue because its reconstruction quality depends heavily on the acquired images. Recently, an explore-then-exploit strategy has been used to acquire images for MVS. This method first constructs a coarse model by exploring an entire scene using a pre-allocated camera trajectory. Then, it rescans the unreconstructed regions from the coarse model. However, this strategy is inefficient because of the frequent overlap of the initial and rescanning trajectories. Furthermore, given the complete coverage of images, MVS algorithms do not guarantee an accurate reconstruction result.In this study, we propose a novel view path-planning method based on an online MVS system. This method aims to incrementally construct the target three-dimensional (3D) model in real time. View paths are continually planned based on online feedbacks from the partially constructed model. The obtained paths fully cover low-quality surfaces while maximizing the reconstruction performance of MVS. Experimental results demonstrate that the proposed method can construct high quality 3D models with one exploration trial, without any rescanning trial as in the explore-then-exploit method.
- Research Article
46
- 10.1016/j.aei.2023.102196
- Sep 28, 2023
- Advanced Engineering Informatics
Improving completeness and accuracy of 3D point clouds by using deep learning for applications of digital twins to civil structures
- Dissertation
- 10.14711/thesis-991012786067603412
- Jan 1, 2019
Multi-view stereo (MVS) reconstructs 3D representations of the scene from imagery, which is a core problem of computer vision extensively studied for decades. Traditionally, MVS algorithms apply hand-crafted similarity metrics and engineered regularizations to compute dense correspondences. While these methods have shown great results under ideal Lambertian scenarios, classical MVS algorithms still suffer from numerous artifacts. In this thesis, we propose to advance the MVS reconstruction using recent deep learning techniques. First, we present an end-to-end deep learning architecture, MVSNet, for depth map inference from multi-view images. The key contribution of this part is the careful integration between multi-view geometries and convolutional neural networks (CNNs). In the network, we extract deep image features and build the 3D cost volume upon the camera frustum via the differentiable homography warping. Then, 3D convolutions are applied to regularize and regress the output depth map. We demonstrate on DTU dataset that MVSNet significantly outperforms previous state-of-the-arts in both reconstruction completeness and overall quality. Next, we propose to extend the MVSNet architecture for large-scale MVS reconstruction. One major limitation of current learning-based approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes. To this end, we sequentially regularize 2D cost maps via the gated recurrent unit (GRU) rather than regularize the entire 3D cost volume in one go. The GRU regularization dramatically reduces memory consumption and makes high-resolution reconstructions feasible. The proposed R-MVSNet is evaluated on the large-scale Tanks and Temples dataset and achieves comparable results to classical large-scale MVS algorithms. Finally, we establish a large-scale synthetic MVS dataset, BlendedMVS, based on blended images and rendered depth maps. While several MVS datasets have been proposed, they fail to provide accurate depth and occlusion information as ground truth mesh models are usually incomplete. We therefore establish a new MVS dataset based on model rendering. Textured meshes are first reconstructed from images of different scenes, which are then rendered into color images, depth maps and occlusion maps. We further blend rendered images with input images using high-pass and low-pass filters to generate our training input. Extensive experiments demonstrate that models trained on BlendedMVS achieve significant better generalization ability compared with models trained on other MVS datasets. In sum, this thesis presents a complete learning-based solution to large-scale multi-view stereopsis, including a current baseline network (MVSNet), its large-scale extension (R-MVSNet) and a large-scale synthetic dataset (BlendedMVS). We bridge the gap between classical MVS reconstructions and recent deep learning techniques and demonstrate the effectiveness of the learning-based MVS through extensive experiments on different datasets.
- Research Article
1
- 10.1007/s11063-018-9816-6
- Mar 14, 2018
- Neural Processing Letters
Multi-view stereo (MVS) map based 3D range reconstruction is to generate 3D ranges by analyzing the surrounding snapshots from different perspectives. Different to the traditional method which employing the expensive and difficult maintaining laser range devices to calibrate the range of the real 3D objects, MVS has achieved its success by seeking the geometrical correlations between the correspondences from the snapshot of different perspectives. The concerning of MVS keeps rising thanks to the fast development of digital maps and 3D printing. Several algorithms with regard to MVS has been well developed and achieved their success with regard to reconstruction of 3D ranges. Meanwhile, most of the algorithms were mainly focusing on the fusion and merging of different scenes and surface refinement. Less capability of the feature matching algorithms on the affine invariant images renders the current MVS algorithms need huge amount of images with tiny perspective differences. In this paper, we will propose a new MVS algorithm, deploying our previous published Affine Invariant Feature Descriptor (AIFD) to detect and match the correspondences from different perspectives and applying Homograph matrix and segmentation to define the planes of the objects. Thanks to the AIFD and Homograph based projection model, our proposed MVS algorithm outperform other MVS algorithms in terms of speed and efficiency.
- Research Article
32
- 10.1088/1757-899x/1073/1/012066
- Feb 1, 2021
- IOP Conference Series: Materials Science and Engineering
The development of the Information and Computer Technology (ICT) sector, three-dimensional (3D) technology is also growing rapidly. Currently, the need to visualize 3D objects is widely used in animation and graphic applications, architecture, education, cultural recognition and Virtual Reality. 3D modeling of historic buildings has become a concern in recent years. 3D reconstruction is an attempt to document reconstruction or restoration if the building is destroyed. By using the 3D model reconstruction using Structure from Motion (SFM) and Multi View Stereo (MVS) algorithm based on Computer Vision, it is hoped that the results of this 3D modeling can be utilized as an effort to preserve 3D objects in the Penataran Temple cultural heritage area. This research was conducted by taking as many as 61 images of objects in the Blitar Penataran Temple area. The photos obtained were reconstructed into a 3D model using the Structure From Motion algorithm in the meshroom. This research a trial of the original image with a compressed image for reconstruction is used to compare the 3D reconstruction process from the two input data. From 61 images processed using the Structure Form Motion algorithm, 33 poses of camera pose and 3D points were improved, both original and compressed images. The number of iterations compresses 1.4% less than the original image and takes 43.53% faster than the original image.
- Research Article
58
- 10.3390/buildings9030070
- Mar 20, 2019
- Buildings
This research presents a novel method for automated construction progress monitoring. Using the proposed method, an accurate and complete 3D point cloud is generated for automatic outdoor and indoor progress monitoring throughout the project duration. In this method, Structured-from-Motion (SFM) and Multi-View-Stereo (MVS) algorithms coupled with photogrammetric principles for the coded targets’ detection are exploited to generate as-built 3D point clouds. The coded targets are utilized to automatically resolve the scale and increase the accuracy of the point cloud generated using SFM and MVS methods. Having generated the point cloud, the CAD model is generated from the as-built point cloud and compared with the as-planned model. Finally, the quantity of the performed work is determined in two real case study projects. The proposed method is compared to the Structured-from-Motion (SFM)/Clustering Multi-Views Stereo (CMVS)/Patch-based Multi-View Stereo (PMVS) algorithm, as a common method for generating 3D point cloud models. The proposed photogrammetric Multi-View Stereo method reveals an accuracy of around 99 percent and the generated noises are less compared to the SFM/CMVS/PMVS algorithm. It is observed that the proposed method has extensively improved the accuracy of generated points cloud compared to the SFM/CMVS/PMVS algorithm. It is believed that the proposed method may present a novel and robust tool for automated progress monitoring in construction projects.
- Research Article
37
- 10.3390/rs8050381
- May 4, 2016
- Remote Sensing
This paper presents a novel multi-view dense point cloud generation algorithm based on low-altitude remote sensing images. The proposed method was designed to be especially effective in enhancing the density of point clouds generated by Multi-View Stereo (MVS) algorithms. To overcome the limitations of MVS and dense matching algorithms, an expanded patch was set up for each point in the point cloud. Then, a patch-based Multiphoto Geometrically Constrained Matching (MPGC) was employed to optimize points on the patch based on least square adjustment, the space geometry relationship, and epipolar line constraint. The major advantages of this approach are twofold: (1) compared with the MVS method, the proposed algorithm can achieve denser three-dimensional (3D) point cloud data; and (2) compared with the epipolar-based dense matching method, the proposed method utilizes redundant measurements to weaken the influence of occlusion and noise on matching results. Comparison studies and experimental results have validated the accuracy of the proposed algorithm in low-altitude remote sensing image dense point cloud generation.
- Book Chapter
2
- 10.1007/978-3-642-38267-3_20
- Jan 1, 2013
Surface reconstruction using patch-based multi-view stereo commonly assumes that the underlying surface is locally planar. This is typically not true so that least-squares fitting of a planar patch leads to systematic errors which are of particular importance for multi-scale surface reconstruction. In a recent paper [12], we determined the modulation transfer function of a classical patch-based stereo system. Our key insight was that the reconstructed surface is a box-filtered version of the original surface. Since the box filter is not a true low-pass filter this causes high-frequency artifacts. In this paper, we propose an extended reconstruction model by weighting the least-squares fit of the 3D patch. We show that if the weighting function meets specified criteria the reconstructed surface is the convolution of the original surface with that weighting function. A choice of particular interest is the Gaussian which is commonly used in image and signal processing but left unexploited by many multi-view stereo algorithms. Finally, we demonstrate the effects of our theoretic findings using experiments on synthetic and real-world data sets.Keywordsmulti-view stereomulti-scale surface reconstruction
- Research Article
90
- 10.3390/app122412886
- Dec 15, 2022
- Applied Sciences
In recent years, structure from motion (SfM) and multi-view stereo (MVS) algorithms have been successfully applied to stereo images generated by cameras mounted on unmanned aerial vehicle (UAV) platforms to build 3D models. Indeed, the approach based on the combination of SfM-MVS and UAV-generated images allows for cost-effective acquisition, fast and automated processing, and detailed and accurate reconstruction of 3D models. As a consequence, this approach has become very popular for representation, management, and conservation in the field of cultural heritage (CH). Therefore, this review paper discusses the use of UAV photogrammetry in CH environments with a focus on state of the art trends and best practices in image acquisition technologies and 3D model-building software. In particular, this paper intends to emphasise the different techniques of image acquisition and processing in relation to the different platforms and navigation systems available, as well as to analyse and deepen the aspects of 3D reconstruction that efficiently describe the entire photogrammetric process, providing further insights for new applications in different fields, such as structural engineering and conservation and maintenance restoration of sites and structures belonging to the CH field.
- Research Article
13
- 10.1587/transinf.2014edp7409
- Jan 1, 2015
- IEICE Transactions on Information and Systems
SUMMARY Methods of window matching to estimate 3D points are the most serious factors affecting the accuracy, robustness, and computational cost of Multi-View Stereo (MVS) algorithms. Most existing MVS algorithms employ window matching based on Normalized CrossCorrelation (NCC) to estimate the depth of a 3D point. NCC-based window matching estimates the displacement between matching windows with sub-pixel accuracy by linear/ cubic interpolation, which does not represent accurate sub-pixel values of matching windows. This paper proposes a technique of window matching that is very accurate using Phase-Only Correlation (POC) with geometric correction for MVS. The accurate sub-pixel displacement between two matching windows can be estimated by fitting the analytical correlation peak model of the POC function. The proposed method also corrects the geometric transformations of matching windows by taking into consideration the 3D shape of a target object. The use of the proposed geometric correction approach makes it possible to achieve accurate 3D reconstruction from multi-view images even for images with large transformations. The proposed method demonstrates more accurate 3D reconstruction from multi-view images than the conventional methods
- Research Article
72
- 10.1016/j.patcog.2019.107112
- Nov 14, 2019
- Pattern Recognition
Depth-map completion for large indoor scene reconstruction
- Research Article
1
- 10.3389/fpls.2025.1610577
- Oct 13, 2025
- Frontiers in Plant Science
IntroductionPlant type is an important part of plant phenotypic research, which is of great significance for practical applications such as plant genomics and cultivation knowledge modeling. The existing plant type judgment mainly relies on subjective experience, and lacks automatic analysis and identification methods, which seriously restricts the progress of efficient crop breeding and precision cultivation.MethodsIn this study, the digital structure model of cotton plant was constructed based on multi-dimensional vision, and the rapid analysis and identification method of cotton plant type was established. 50 cotton plants were used as experimental objects in this study. Firstly, multi-view images of cotton plants at boll opening stage were collected, and a three-dimensional point cloud model of cotton plants was constructed based on Structure From Motion and Multi View Stereo (SFM-MVS) algorithm. The original cotton point cloud data was preprocessed by coordinate correction, statistical filtering, conditional filtering and down-sampling to obtain a high-quality three-dimensional model. The three-dimensional model is projected in two dimensions to obtain the two-dimensional projection data of cotton plants from multiple perspectives. Secondly, based on the fast convex hull algorithm, the cotton plant two-dimensional convex hull was constructed from multiple perspectives, and the distribution range and corner change rate of each corners of the convex hull were analyzed, and the identification basis of cotton plant type was established.ResultsThe R2 of plant height and width extracted from the model were greater than 0.90, and RMES were 0.372 cm and 0.387 cm, respectively. When the maximum number of point clouds is 75335, the point cloud reading time, cotton multi-view projection time, and convex hull automatic construction time are 0.402 S, 2.275 S, and 0.018 S, respectively. Finally, the cotton cylinder type classification interval is 0-0.2, and the tower type classification interval is 0.4-1.5.DiscussionThe cotton plant type identification method proposed in this study is fast and efficient. It provides a solid theoretical basis and technical support for cotton plant type identification.
- Conference Article
5
- 10.1145/3384382.3384530
- May 5, 2020
We present a user-guided system for accessible 3D reconstruction and modeling of real-world objects using multi-view stereo. The system is an interactive tool where the user models the object on top of multiple selected photographs. Our tool helps the user place quads correctly aligned to the photographs using a multi-view stereo algorithm. This algorithm in combination with user-provided information about topology, visibility, and how to separate foreground from background, creates favorable conditions in successfully reconstructing the object. The user only needs to manually specify a coarse topology which, followed by subdivision and a global optimization algorithm, creates an accurate model with the desired mesh density. This global optimization algorithm has a higher probability of converging to an accurate result than a fully automatic system. With our proposed tool, we lower the barrier of entry for creating high-quality 3D reconstructions of real-world objects with a desirable topology. Our interactive tool separates the most tedious and difficult parts of modeling to the computer, while giving the user control over the most common robustness issues in automatic 3D reconstruction. The provided workflow can be a preferable alternative to using automatic scanning techniques followed by re-topologization.
- Conference Article
42
- 10.1109/cvpr.2008.4587688
- Jun 1, 2008
The Middlebury multi-view stereo evaluation clearly shows that the quality and speed of most multi-view stereo algorithms depends significantly on the number and selection of input images. In general, not all input images contribute equally to the quality of the output model, since several images may often contain similar and hence overly redundant visual information. This leads to unnecessarily increased processing times. On the other hand, a certain degree of redundancy can help to improve the reconstruction in more ldquodifficultrdquo regions of a model. In this paper we propose an image selection scheme for multi-view stereo which results in improved reconstruction quality compared to uniformly distributed views. Our method is tuned towards the typical requirements of current multi-view stereo algorithms, and is based on the idea of incrementally selecting images so that the overall coverage of a simultaneously generated proxy is guaranteed without adding too much redundant information. Critical regions such as cavities are detected by an estimate of the local photo-consistency and are improved by adding additional views. Our method is highly efficient, since most computations can be out-sourced to the GPU. We evaluate our method with four different methods participating in the Middlebury benchmark and show that in each case reconstructions based on our selected images yield an improved output quality while at the same time reducing the processing time considerably.
- Research Article
6
- 10.1080/00405000.2021.1882071
- Feb 23, 2021
- The Journal of The Textile Institute
Traditionally, fabric wrinkle assessment is based on human eyes, which is subjective with the disadvantages of low efficiency and accuracy. In this paper, an objective fabric wrinkle evaluation method was proposed to solve the previous problems due to the subjective evaluation. A self-developed multiple image acquisition system was established for the image capturing of wrinkled fabrics from different direction. Three-dimensional (3D) surface profile of wrinkled fabric was reconstructed by an improved patch-based multi-view stereo vision algorithm. A two-dimensional depth image could be generated directly from the 3D point cloud model, after that, four texture feature parameters were extracted from the depth image using gray level co-occurrence matrix. Finally, these four feature parameters were selected as the input vector and wrinkle grade as the output to form a support vector machine for the objective assessment of fabric wrinkle appearance. Our experimental results indicated that the recognition accuracy of the proposed method and system was more than 90%. The originality of our research is that the wrinkle features based on 3D surface profile reconstructing can avoid interference caused by fabric color and texture and further improve the recognition accuracy of objective evaluation.
- Book Chapter
1
- 10.4018/978-1-4666-3994-2.ch009
- Jan 1, 2013
3D modeling of complex objects is an important task of computer graphics and poses substantial difficulties to traditional synthetic modeling approaches. The multi-view stereo reconstruction technique, which tries to automatically acquire object models from multiple photographs, provides an attractive alternative. The whole reconstruction process of the multi-view stereo technique is introduced in this chapter, from camera calibration and image acquisition to various reconstruction algorithms. The shape from silhouette technique is also introduced since it provides a close shape approximation for many multi-view stereo algorithms. Various multi-view algorithms have been proposed, which can be mainly classified into four classes: 3D volumetric, surface evolution, feature extraction and expansion, and depth map based approaches. This chapter explains the underlying theory and pipeline of each class in detail and analyzes their major properties. Two published benchmarks that are used to qualitatively evaluate multi-view stereo algorithms are presented, along with the benchmark criteria and evaluation results.