Active 3D Modeling via Online Multi-View Stereo

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Multi-view stereo (MVS) algorithms have been commonly used to model large-scale structures. When processing MVS, image acquisition is an important issue because its reconstruction quality depends heavily on the acquired images. Recently, an explore-then-exploit strategy has been used to acquire images for MVS. This method first constructs a coarse model by exploring an entire scene using a pre-allocated camera trajectory. Then, it rescans the unreconstructed regions from the coarse model. However, this strategy is inefficient because of the frequent overlap of the initial and rescanning trajectories. Furthermore, given the complete coverage of images, MVS algorithms do not guarantee an accurate reconstruction result.In this study, we propose a novel view path-planning method based on an online MVS system. This method aims to incrementally construct the target three-dimensional (3D) model in real time. View paths are continually planned based on online feedbacks from the partially constructed model. The obtained paths fully cover low-quality surfaces while maximizing the reconstruction performance of MVS. Experimental results demonstrate that the proposed method can construct high quality 3D models with one exploration trial, without any rescanning trial as in the explore-then-exploit method.

Similar Papers
  • Research Article
  • Cite Count Icon 46
  • 10.1016/j.aei.2023.102196
Improving completeness and accuracy of 3D point clouds by using deep learning for applications of digital twins to civil structures
  • Sep 28, 2023
  • Advanced Engineering Informatics
  • Shihong Chen + 2 more

Improving completeness and accuracy of 3D point clouds by using deep learning for applications of digital twins to civil structures

  • Dissertation
  • 10.14711/thesis-991012786067603412
Learning large-scale multi-view stereopsis
  • Jan 1, 2019
  • Yao Yao

Multi-view stereo (MVS) reconstructs 3D representations of the scene from imagery, which is a core problem of computer vision extensively studied for decades. Traditionally, MVS algorithms apply hand-crafted similarity metrics and engineered regularizations to compute dense correspondences. While these methods have shown great results under ideal Lambertian scenarios, classical MVS algorithms still suffer from numerous artifacts. In this thesis, we propose to advance the MVS reconstruction using recent deep learning techniques. First, we present an end-to-end deep learning architecture, MVSNet, for depth map inference from multi-view images. The key contribution of this part is the careful integration between multi-view geometries and convolutional neural networks (CNNs). In the network, we extract deep image features and build the 3D cost volume upon the camera frustum via the differentiable homography warping. Then, 3D convolutions are applied to regularize and regress the output depth map. We demonstrate on DTU dataset that MVSNet significantly outperforms previous state-of-the-arts in both reconstruction completeness and overall quality. Next, we propose to extend the MVSNet architecture for large-scale MVS reconstruction. One major limitation of current learning-based approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes. To this end, we sequentially regularize 2D cost maps via the gated recurrent unit (GRU) rather than regularize the entire 3D cost volume in one go. The GRU regularization dramatically reduces memory consumption and makes high-resolution reconstructions feasible. The proposed R-MVSNet is evaluated on the large-scale Tanks and Temples dataset and achieves comparable results to classical large-scale MVS algorithms. Finally, we establish a large-scale synthetic MVS dataset, BlendedMVS, based on blended images and rendered depth maps. While several MVS datasets have been proposed, they fail to provide accurate depth and occlusion information as ground truth mesh models are usually incomplete. We therefore establish a new MVS dataset based on model rendering. Textured meshes are first reconstructed from images of different scenes, which are then rendered into color images, depth maps and occlusion maps. We further blend rendered images with input images using high-pass and low-pass filters to generate our training input. Extensive experiments demonstrate that models trained on BlendedMVS achieve significant better generalization ability compared with models trained on other MVS datasets. In sum, this thesis presents a complete learning-based solution to large-scale multi-view stereopsis, including a current baseline network (MVSNet), its large-scale extension (R-MVSNet) and a large-scale synthetic dataset (BlendedMVS). We bridge the gap between classical MVS reconstructions and recent deep learning techniques and demonstrate the effectiveness of the learning-based MVS through extensive experiments on different datasets.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1007/s11063-018-9816-6
AIFD Based 2D Image Registration to Multi-View Stereo Mapped 3D Models
  • Mar 14, 2018
  • Neural Processing Letters
  • Biao Zhao

Multi-view stereo (MVS) map based 3D range reconstruction is to generate 3D ranges by analyzing the surrounding snapshots from different perspectives. Different to the traditional method which employing the expensive and difficult maintaining laser range devices to calibrate the range of the real 3D objects, MVS has achieved its success by seeking the geometrical correlations between the correspondences from the snapshot of different perspectives. The concerning of MVS keeps rising thanks to the fast development of digital maps and 3D printing. Several algorithms with regard to MVS has been well developed and achieved their success with regard to reconstruction of 3D ranges. Meanwhile, most of the algorithms were mainly focusing on the fusion and merging of different scenes and surface refinement. Less capability of the feature matching algorithms on the affine invariant images renders the current MVS algorithms need huge amount of images with tiny perspective differences. In this paper, we will propose a new MVS algorithm, deploying our previous published Affine Invariant Feature Descriptor (AIFD) to detect and match the correspondences from different perspectives and applying Homograph matrix and segmentation to define the planes of the objects. Thanks to the AIFD and Homograph based projection model, our proposed MVS algorithm outperform other MVS algorithms in terms of speed and efficiency.

  • Research Article
  • Cite Count Icon 32
  • 10.1088/1757-899x/1073/1/012066
3D reconstruction using Structure From Motion (SFM) algorithm and Multi View Stereo (MVS) based on computer vision
  • Feb 1, 2021
  • IOP Conference Series: Materials Science and Engineering
  • M Kholil + 2 more

The development of the Information and Computer Technology (ICT) sector, three-dimensional (3D) technology is also growing rapidly. Currently, the need to visualize 3D objects is widely used in animation and graphic applications, architecture, education, cultural recognition and Virtual Reality. 3D modeling of historic buildings has become a concern in recent years. 3D reconstruction is an attempt to document reconstruction or restoration if the building is destroyed. By using the 3D model reconstruction using Structure from Motion (SFM) and Multi View Stereo (MVS) algorithm based on Computer Vision, it is hoped that the results of this 3D modeling can be utilized as an effort to preserve 3D objects in the Penataran Temple cultural heritage area. This research was conducted by taking as many as 61 images of objects in the Blitar Penataran Temple area. The photos obtained were reconstructed into a 3D model using the Structure From Motion algorithm in the meshroom. This research a trial of the original image with a compressed image for reconstruction is used to compare the 3D reconstruction process from the two input data. From 61 images processed using the Structure Form Motion algorithm, 33 poses of camera pose and 3D points were improved, both original and compressed images. The number of iterations compresses 1.4% less than the original image and takes 43.53% faster than the original image.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 58
  • 10.3390/buildings9030070
Automated Progress Controlling and Monitoring Using Daily Site Images and Building Information Modelling
  • Mar 20, 2019
  • Buildings
  • Hadi Mahami + 3 more

This research presents a novel method for automated construction progress monitoring. Using the proposed method, an accurate and complete 3D point cloud is generated for automatic outdoor and indoor progress monitoring throughout the project duration. In this method, Structured-from-Motion (SFM) and Multi-View-Stereo (MVS) algorithms coupled with photogrammetric principles for the coded targets’ detection are exploited to generate as-built 3D point clouds. The coded targets are utilized to automatically resolve the scale and increase the accuracy of the point cloud generated using SFM and MVS methods. Having generated the point cloud, the CAD model is generated from the as-built point cloud and compared with the as-planned model. Finally, the quantity of the performed work is determined in two real case study projects. The proposed method is compared to the Structured-from-Motion (SFM)/Clustering Multi-Views Stereo (CMVS)/Patch-based Multi-View Stereo (PMVS) algorithm, as a common method for generating 3D point cloud models. The proposed photogrammetric Multi-View Stereo method reveals an accuracy of around 99 percent and the generated noises are less compared to the SFM/CMVS/PMVS algorithm. It is observed that the proposed method has extensively improved the accuracy of generated points cloud compared to the SFM/CMVS/PMVS algorithm. It is believed that the proposed method may present a novel and robust tool for automated progress monitoring in construction projects.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 37
  • 10.3390/rs8050381
A Multi-View Dense Point Cloud Generation Algorithm Based on Low-Altitude Remote Sensing Images
  • May 4, 2016
  • Remote Sensing
  • Zhenfeng Shao + 4 more

This paper presents a novel multi-view dense point cloud generation algorithm based on low-altitude remote sensing images. The proposed method was designed to be especially effective in enhancing the density of point clouds generated by Multi-View Stereo (MVS) algorithms. To overcome the limitations of MVS and dense matching algorithms, an expanded patch was set up for each point in the point cloud. Then, a patch-based Multiphoto Geometrically Constrained Matching (MPGC) was employed to optimize points on the patch based on least square adjustment, the space geometry relationship, and epipolar line constraint. The major advantages of this approach are twofold: (1) compared with the MVS method, the proposed algorithm can achieve denser three-dimensional (3D) point cloud data; and (2) compared with the epipolar-based dense matching method, the proposed method utilizes redundant measurements to weaken the influence of occlusion and noise on matching results. Comparison studies and experimental results have validated the accuracy of the proposed algorithm in low-altitude remote sensing image dense point cloud generation.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-642-38267-3_20
Weighted Patch-Based Reconstruction: Linking (Multi-view) Stereo to Scale Space
  • Jan 1, 2013
  • Ronny Klowsky + 2 more

Surface reconstruction using patch-based multi-view stereo commonly assumes that the underlying surface is locally planar. This is typically not true so that least-squares fitting of a planar patch leads to systematic errors which are of particular importance for multi-scale surface reconstruction. In a recent paper [12], we determined the modulation transfer function of a classical patch-based stereo system. Our key insight was that the reconstructed surface is a box-filtered version of the original surface. Since the box filter is not a true low-pass filter this causes high-frequency artifacts. In this paper, we propose an extended reconstruction model by weighting the least-squares fit of the 3D patch. We show that if the weighting function meets specified criteria the reconstructed surface is the convolution of the original surface with that weighting function. A choice of particular interest is the Gaussian which is commonly used in image and signal processing but left unexploited by many multi-view stereo algorithms. Finally, we demonstrate the effects of our theoretic findings using experiments on synthetic and real-world data sets.Keywordsmulti-view stereomulti-scale surface reconstruction

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 90
  • 10.3390/app122412886
UAV Platforms and the SfM-MVS Approach in the 3D Surveys and Modelling: A Review in the Cultural Heritage Field
  • Dec 15, 2022
  • Applied Sciences
  • Massimiliano Pepe + 2 more

In recent years, structure from motion (SfM) and multi-view stereo (MVS) algorithms have been successfully applied to stereo images generated by cameras mounted on unmanned aerial vehicle (UAV) platforms to build 3D models. Indeed, the approach based on the combination of SfM-MVS and UAV-generated images allows for cost-effective acquisition, fast and automated processing, and detailed and accurate reconstruction of 3D models. As a consequence, this approach has become very popular for representation, management, and conservation in the field of cultural heritage (CH). Therefore, this review paper discusses the use of UAV photogrammetry in CH environments with a focus on state of the art trends and best practices in image acquisition technologies and 3D model-building software. In particular, this paper intends to emphasise the different techniques of image acquisition and processing in relation to the different platforms and navigation systems available, as well as to analyse and deepen the aspects of 3D reconstruction that efficiently describe the entire photogrammetric process, providing further insights for new applications in different fields, such as structural engineering and conservation and maintenance restoration of sites and structures belonging to the CH field.

  • Research Article
  • Cite Count Icon 13
  • 10.1587/transinf.2014edp7409
Phase-Based Window Matching with Geometric Correction for Multi-View Stereo
  • Jan 1, 2015
  • IEICE Transactions on Information and Systems
  • Shuji Sakai + 4 more

SUMMARY Methods of window matching to estimate 3D points are the most serious factors affecting the accuracy, robustness, and computational cost of Multi-View Stereo (MVS) algorithms. Most existing MVS algorithms employ window matching based on Normalized CrossCorrelation (NCC) to estimate the depth of a 3D point. NCC-based window matching estimates the displacement between matching windows with sub-pixel accuracy by linear/ cubic interpolation, which does not represent accurate sub-pixel values of matching windows. This paper proposes a technique of window matching that is very accurate using Phase-Only Correlation (POC) with geometric correction for MVS. The accurate sub-pixel displacement between two matching windows can be estimated by fitting the analytical correlation peak model of the POC function. The proposed method also corrects the geometric transformations of matching windows by taking into consideration the 3D shape of a target object. The use of the proposed geometric correction approach makes it possible to achieve accurate 3D reconstruction from multi-view images even for images with large transformations. The proposed method demonstrates more accurate 3D reconstruction from multi-view images than the conventional methods

  • Research Article
  • Cite Count Icon 72
  • 10.1016/j.patcog.2019.107112
Depth-map completion for large indoor scene reconstruction
  • Nov 14, 2019
  • Pattern Recognition
  • Hongmin Liu + 2 more

Depth-map completion for large indoor scene reconstruction

  • Research Article
  • Cite Count Icon 1
  • 10.3389/fpls.2025.1610577
Research on cotton plant type identification method based on multidimensional vision
  • Oct 13, 2025
  • Frontiers in Plant Science
  • Ying Liu + 6 more

IntroductionPlant type is an important part of plant phenotypic research, which is of great significance for practical applications such as plant genomics and cultivation knowledge modeling. The existing plant type judgment mainly relies on subjective experience, and lacks automatic analysis and identification methods, which seriously restricts the progress of efficient crop breeding and precision cultivation.MethodsIn this study, the digital structure model of cotton plant was constructed based on multi-dimensional vision, and the rapid analysis and identification method of cotton plant type was established. 50 cotton plants were used as experimental objects in this study. Firstly, multi-view images of cotton plants at boll opening stage were collected, and a three-dimensional point cloud model of cotton plants was constructed based on Structure From Motion and Multi View Stereo (SFM-MVS) algorithm. The original cotton point cloud data was preprocessed by coordinate correction, statistical filtering, conditional filtering and down-sampling to obtain a high-quality three-dimensional model. The three-dimensional model is projected in two dimensions to obtain the two-dimensional projection data of cotton plants from multiple perspectives. Secondly, based on the fast convex hull algorithm, the cotton plant two-dimensional convex hull was constructed from multiple perspectives, and the distribution range and corner change rate of each corners of the convex hull were analyzed, and the identification basis of cotton plant type was established.ResultsThe R2 of plant height and width extracted from the model were greater than 0.90, and RMES were 0.372 cm and 0.387 cm, respectively. When the maximum number of point clouds is 75335, the point cloud reading time, cotton multi-view projection time, and convex hull automatic construction time are 0.402 S, 2.275 S, and 0.018 S, respectively. Finally, the cotton cylinder type classification interval is 0-0.2, and the tower type classification interval is 0.4-1.5.DiscussionThe cotton plant type identification method proposed in this study is fast and efficient. It provides a solid theoretical basis and technical support for cotton plant type identification.

  • Conference Article
  • Cite Count Icon 5
  • 10.1145/3384382.3384530
User-guided 3D reconstruction using multi-view stereo
  • May 5, 2020
  • Sverker Rasmuson + 2 more

We present a user-guided system for accessible 3D reconstruction and modeling of real-world objects using multi-view stereo. The system is an interactive tool where the user models the object on top of multiple selected photographs. Our tool helps the user place quads correctly aligned to the photographs using a multi-view stereo algorithm. This algorithm in combination with user-provided information about topology, visibility, and how to separate foreground from background, creates favorable conditions in successfully reconstructing the object. The user only needs to manually specify a coarse topology which, followed by subdivision and a global optimization algorithm, creates an accurate model with the desired mesh density. This global optimization algorithm has a higher probability of converging to an accurate result than a fully automatic system. With our proposed tool, we lower the barrier of entry for creating high-quality 3D reconstructions of real-world objects with a desirable topology. Our interactive tool separates the most tedious and difficult parts of modeling to the computer, while giving the user control over the most common robustness issues in automatic 3D reconstruction. The provided workflow can be a preferable alternative to using automatic scanning techniques followed by re-topologization.

  • Conference Article
  • Cite Count Icon 42
  • 10.1109/cvpr.2008.4587688
Image selection for improved Multi-View Stereo
  • Jun 1, 2008
  • Alexander Hornung + 2 more

The Middlebury multi-view stereo evaluation clearly shows that the quality and speed of most multi-view stereo algorithms depends significantly on the number and selection of input images. In general, not all input images contribute equally to the quality of the output model, since several images may often contain similar and hence overly redundant visual information. This leads to unnecessarily increased processing times. On the other hand, a certain degree of redundancy can help to improve the reconstruction in more ldquodifficultrdquo regions of a model. In this paper we propose an image selection scheme for multi-view stereo which results in improved reconstruction quality compared to uniformly distributed views. Our method is tuned towards the typical requirements of current multi-view stereo algorithms, and is based on the idea of incrementally selecting images so that the overall coverage of a simultaneously generated proxy is guaranteed without adding too much redundant information. Critical regions such as cavities are detected by an estimate of the local photo-consistency and are improved by adding additional views. Our method is highly efficient, since most computations can be out-sourced to the GPU. We evaluate our method with four different methods participating in the Middlebury benchmark and show that in each case reconstructions based on our selected images yield an improved output quality while at the same time reducing the processing time considerably.

  • Research Article
  • Cite Count Icon 6
  • 10.1080/00405000.2021.1882071
A novel objective wrinkle evaluation method for printed fabrics based on multi-view stereo algorithm
  • Feb 23, 2021
  • The Journal of The Textile Institute
  • Na Deng + 3 more

Traditionally, fabric wrinkle assessment is based on human eyes, which is subjective with the disadvantages of low efficiency and accuracy. In this paper, an objective fabric wrinkle evaluation method was proposed to solve the previous problems due to the subjective evaluation. A self-developed multiple image acquisition system was established for the image capturing of wrinkled fabrics from different direction. Three-dimensional (3D) surface profile of wrinkled fabric was reconstructed by an improved patch-based multi-view stereo vision algorithm. A two-dimensional depth image could be generated directly from the 3D point cloud model, after that, four texture feature parameters were extracted from the depth image using gray level co-occurrence matrix. Finally, these four feature parameters were selected as the input vector and wrinkle grade as the output to form a support vector machine for the objective assessment of fabric wrinkle appearance. Our experimental results indicated that the recognition accuracy of the proposed method and system was more than 90%. The originality of our research is that the wrinkle features based on 3D surface profile reconstructing can avoid interference caused by fabric color and texture and further improve the recognition accuracy of objective evaluation.

  • Book Chapter
  • Cite Count Icon 1
  • 10.4018/978-1-4666-3994-2.ch009
Multi-View Stereo Reconstruction Technique
  • Jan 1, 2013
  • Peng Song + 1 more

3D modeling of complex objects is an important task of computer graphics and poses substantial difficulties to traditional synthetic modeling approaches. The multi-view stereo reconstruction technique, which tries to automatically acquire object models from multiple photographs, provides an attractive alternative. The whole reconstruction process of the multi-view stereo technique is introduced in this chapter, from camera calibration and image acquisition to various reconstruction algorithms. The shape from silhouette technique is also introduced since it provides a close shape approximation for many multi-view stereo algorithms. Various multi-view algorithms have been proposed, which can be mainly classified into four classes: 3D volumetric, surface evolution, feature extraction and expansion, and depth map based approaches. This chapter explains the underlying theory and pipeline of each class in detail and analyzes their major properties. Two published benchmarks that are used to qualitatively evaluate multi-view stereo algorithms are presented, along with the benchmark criteria and evaluation results.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant