3D RECONSTRUCTION FROM MULTI-VIEW GOOGLE EARTH SATELLITE STEREO IMAGES BY GENERATING VIRTUAL RPC BASED ON 3D HOMOGRAPHY-BASED GEOREFERENCING
Abstract. In this paper, we propose a method for performing 3D reconstruction by generating virtual RPC parameters from multi-view satellite stereo images provided by Google Earth (GE) software. In the multi-view stereo (MVS) image in a general case, after the pose and parameters of the camera are estimated, a dense 3D surface can be reconstructed. However, in the case of satellite images, it is not easy to obtain the original images with pose parameters of an area of interest. In the case of GE software, which can obtain images across the globe, the images provided are georeferenced and modified to fit the ground control point (GCP), so there is no camera model to explain the projection relationship. Therefore, the purpose of the proposed method is to perform 3D reconstruction by generating virtual camera parameters in modified satellite images obtained from GE software. In the proposed method, satellite images obtained from GE are estimated to be pinhole images using structure from motion (SfM) for initial reconstruction. After initial reconstruction, the 3D model is transformed from a distorted hexahedral space formed along a pixel ray to a UTM coordinate system metric space through a 3D homography-based georeferencing. A virtual rational polynomial camera (RPC) parameter is calculated through the satellite images and the 3D interspace correspondence point of UTM coordinates. The result is generated by virtual RPC and the MVS method using the RPC model. The reconstructed DSM using virtual RPC is improved over the initial reconstruction of the proposed process, and error measurement in the area with GT obtained significant results with an average of 1.366m on an MAE method.
- Research Article
7
- 10.1080/01431161.2023.2214275
- May 3, 2023
- International Journal of Remote Sensing
This paper proposes a novel approach for automatic 3D surface reconstruction from uncalibrated and multi-view Google Earth images by using a multi-view stereo method and 3D projective to metric transformation. Without the Rational Polynomial Coefficients, it is impossible to obtain the metric reconstruction of the 3D surface from multi-view satellite images. We solve the uncalibrated multi-view satellite image problem by employing a multi-view stereo vision technique followed by a projective to metric transformation. The virtual pose parameters of the satellite images are obtained by using COLMAP, and the virtual 3D projective reconstruction is done by using EnSoft3D. For projective to metric transformation, we propose to employ 3D homography transformation. Eight 3D correspondence pairs on the viewing frustums between the virtual reference camera and the ideal nadir camera are used to derive a 3D homography matrix. Using the 3D homography matrix, we finally obtain the metric reconstruction of 3D surface up to an unknown height scale in a reference coordinate system of the Google Earth desktop software. Experiments are done in several world locations on Google Earth including building and vegetation areas. Reconstruction error analysis with the Data Fusion Contest 19 dataset is also presented. The average of MAE and RMSE of five tile regions in the dataset are 1.596 m and 2.083 m, respectively.
- Research Article
1
- 10.3390/rs16203863
- Oct 17, 2024
- Remote Sensing
In this paper, we propose a 3D Digital Surface Model (DSM) reconstruction method from uncalibrated Multi-view Satellite Stereo (MVSS) images, where Rational Polynomial Coefficient (RPC) sensor parameters are not available. While recent investigations have introduced several techniques to reconstruct high-precision and high-density DSMs from MVSS images, they inherently depend on the use of geo-corrected RPC sensor parameters. However, RPC parameters from satellite sensors are subject to being erroneous due to inaccurate sensor data. In addition, due to the increasing data availability from the internet, uncalibrated satellite images can be easily obtained without RPC parameters. This study proposes a novel method to reconstruct a 3D DSM from uncalibrated MVSS images by estimating and integrating RPC parameters. To do this, we first employ a structure from motion (SfM) and 3D homography-based geo-referencing method to reconstruct an initial DSM. Second, we sample 3D points from the initial DSM as references and reproject them to the 2D image space to determine 3D–2D correspondences. Using the correspondences, we directly calculate all RPC parameters. To overcome the memory shortage problem while running the large size of satellite images, we also propose an RPC integration method. Image space is partitioned to multiple tiles, and RPC estimation is performed independently in each tile. Then, all tiles’ RPCs are integrated into the final RPC to represent the geometry of the whole image space. Finally, the integrated RPC is used to run a true MVSS pipeline to obtain the 3D DSM. The experimental results show that the proposed method can achieve 1.455 m Mean Absolute Error (MAE) in the height map reconstruction from multi-view satellite benchmark datasets. We also show that the proposed method can be used to reconstruct a geo-referenced 3D DSM from uncalibrated and freely available Google Earth imagery.
- Conference Article
7
- 10.1109/iccv48922.2021.00612
- Oct 1, 2021
While learning-based multi-view stereo (MVS) methods have recently shown successful performances in quality and efficiency, limited MVS data hampers generalization to unseen environments. A simple solution is to generate various large-scale MVS datasets, but generating dense ground truth for 3D structure requires a huge amount of time and resources. On the other hand, if the reliance on dense ground truth is relaxed, MVS systems will generalize more smoothly to new environments. To this end, we first introduce a novel semi-supervised multi-view stereo framework called a Sparse Ground truth-based MVS Network (SGT-MVSNet) that can reliably reconstruct the 3D structures even with a few ground truth 3D points. Our strategy is to divide the accurate and erroneous regions and individually conquer them based on our observation that a probability map can separate these regions. We propose a self-supervision loss called the 3D Point Consistency Loss to enhance the 3D reconstruction performance, which forces the 3D points back-projected from the corresponding pixels by the predicted depth values to meet at the same 3D co-ordinates. Finally, we propagate these improved depth pre-dictions toward edges and occlusions by the Coarse-to-fine Reliable Depth Propagation module. We generate the spare ground truth of the DTU dataset for evaluation and extensive experiments verify that our SGT-MVSNet outperforms the state-of-the-art MVS methods on the sparse ground truth setting. Moreover, our method shows comparable reconstruction results to the supervised MVS methods though we only used tens and hundreds of ground truth 3D points.
- Research Article
4
- 10.4218/etrij.2021-0305
- Jun 23, 2022
- ETRI Journal
The learning-based multiview stereo (MVS) methods for three-dimensional (3D) reconstruction generally use 3D volumes for depth inference. The quality of the reconstructed depth maps and the corresponding point clouds is directly influenced by the spatial resolution of the 3D volume. Consequently, these methods produce point clouds with sparse local regions because of the lack of the memory required to encode a high volume of information. Here, we apply the atrous spatial pyramid pooling (ASPP) module in MVS methods to obtain dense feature maps with multiscale, long-range, contextual information using high receptive fields. For a given 3D volume with the same spatial resolution as that in the MVS methods, the dense feature maps from the ASPP module encoded with superior information can produce dense point clouds without a high memory footprint. Furthermore, we propose a 3D loss for training the MVS networks, which improves the predicted depth values by 24.44%. The ASPP module provides state-of-the-art qualitative results by constructing relatively dense point clouds, which improves the DTU MVS dataset benchmarks by 2.25% compared with those achieved in the previous MVS methods.
- Research Article
11
- 10.1007/s00371-013-0827-z
- May 22, 2013
- The Visual Computer
There are three main approaches for reconstructing 3D models of buildings. Laser scanning is accurate but expensive and limited by the laser's range. Structure-from-motion (SfM) and multi-view stereo (MVS) recover 3D point clouds from multiple views of a building. MVS methods, especially patch-based MVS, can achieve higher density than do SfM methods. Sophisticated algorithms need to be applied to the point clouds to construct mesh surfaces. The recovered point clouds can be sparse in areas that lack features for accurate reconstruction, making recovery of complete surfaces difficult. Moreover, segmentation of the building's surfaces from surrounding surfaces almost always requires some form of manual inputs, diminishing the ease of practical application of automatic 3D reconstruction algorithms. This paper presents an alternative approach for reconstructing textured mesh surfaces from point cloud recovered by patch-based MVS method. To a good first approximation, a building's surfaces can be modeled by planes or curve surfaces which are fitted to the point cloud. 3D points are resampled on the fitted surfaces in an orderly pattern, whose colors are obtained from the input images. This approach is simple, inexpensive, and effective for reconstructing textured mesh surfaces of large buildings. Test results show that the reconstructed 3D models are sufficiently accurate and realistic for 3D visualization in various applications.
- Conference Article
57
- 10.1109/iccv.2019.00114
- Oct 1, 2019
Highly accurate 3D volumetric reconstruction is still an open research topic where the main difficulty is usually related to merging some rough estimations with high frequency details. One of the most promising methods is the fusion between multi-view stereo and photometric stereo images. Beside the intrinsic difficulties that multi-view stereo and photometric stereo in order to work reliably, supplementary problems arise when considered together. In this work, we present a volumetric approach to the multi-view photometric stereo problem. The key point of our method is the signed distance field parameterisation and its relation to the surface normal. This is exploited in order to obtain a linear partial differential equation which is solved in a variational framework, that combines multiple images from multiple points of view in a single system. In addition, the volumetric approach is naturally implemented on an octree, which allows for fast ray-tracing that reliably alleviates occlusions and cast shadows. Our approach is evaluated on synthetic and real data-sets and achieves state-of-the-art results.
- Conference Article
33
- 10.1109/iccv48922.2021.00609
- Oct 1, 2021
Satellite multi-view stereo (MVS) imagery is particularly suited for large-scale Earth surface reconstruction. Differing from the perspective camera model (pin-hole model) that is commonly used for close-range and aerial cameras, the cubic rational polynomial camera (RPC) model is the mainstream model for push-broom linear-array satellite cameras. However, the homography warping used in the prevailing learning based MVS methods is only applicable to pin-hole cameras. In order to apply the SOTA learning based MVS technology to the satellite MVS task for large-scale Earth surface reconstruction, RPC warping should be considered. In this work, we propose, for the first time, a rigorous RPC warping module. The rational polynomial coefficients are recorded as a tensor, and the RPC warping is formulated as a series of tensor transformations. Based on the RPC warping, we propose the deep learning based satellite MVS (SatMVS) framework for large-scale and wide depth range Earth surface reconstruction. We also introduce a large-scale satellite image dataset consisting of 519 5120×5120 images, which we call the TLC SatMVS dataset. The satellite images were acquired from a three-line camera (TLC) that catches triple-view images simultaneously, forming a valuable supplement to the existing open-source WorldView-3 datasets with single-scanline images. Experiments show that the proposed RPC warping module and the SatMVS framework can achieve a superior reconstruction accuracy compared to the pin-hole fitting method and conventional MVS methods. Code and data are available at https://github.com/WHU-GPCV/SatMVS.
- Research Article
3
- 10.1109/tpami.2025.3597148
- Nov 1, 2025
- IEEE transactions on pattern analysis and machine intelligence
To reconstruct the 3D geometry from calibrated images, learning-based multi-view stereo (MVS) methods typically perform multi-view depth estimation and then fuse depth maps into a mesh or point cloud. To improve the computational efficiency, many methods initialize a coarse depth map and then gradually refine it in higher resolutions. Recently, diffusion models achieve great success in generation tasks. Starting from a random noise, diffusion models gradually recover the sample with an iterative denoising process. In this paper, we propose a novel MVS framework, which introduces diffusion models in MVS. Specifically, we formulate depth refinement as a conditional diffusion process. Considering the discriminative characteristic of depth estimation, we design a condition encoder to guide the diffusion process. To improve efficiency, we propose a novel diffusion network combining lightweight 2D U-Net and convolutional GRU. Moreover, we propose a novel confidence-based sampling strategy to adaptively sample depth hypotheses based on the confidence estimated by diffusion model. Based on our novel MVS framework, we propose two novel MVS methods, DiffMVS and CasDiffMVS. DiffMVS achieves competitive performance with state-of-the-art efficiency in run-time and GPU memory. CasDiffMVS achieves state-of-the-art performance on DTU, Tanks & Temples and ETH3D.
- Research Article
1
- 10.24355/dbbs.084-201802201236
- Jan 1, 2018
- Digitale Bibliothek Braunschweig (Verbundzentrale Göttingen (VZG))
In the last decades, the field of Geomatics applied to architectural and cultural heritage has benefited from some major advances. Upon their introduction, laser scanners have revolutionized the surveying world, gradually establishing as a basic tool, at first for terrestrial and later for airborne surveys. At about the same time, photogrammetry also experienced its own evolution, culminating with Structure from Motion (SfM) and MVS (Multi-View Stereo) algorithms. These algorithms generate dense 3D colour point clouds that however may not always be considered reliable. In fact, matching algorithms can be very sensitive to data collection, lighting and texture, and accuracy control is difficult. The high automation levels attainable also require caution because, while allowing for quicker modelling, control and perception of the steps to follow become looser. In terrestrial surveys, accessibility can still present challenging issues, where both Terrestrial Laser Scanning (TLS) and terrestrial photogrammetry are not viable options. Many situations do not allow acquisition of both images and TLS data, necessary to generate 3D models. A solution of this problem is a new technique for the acquisition of photogrammetric data, based on the use of Unmanned Aerial Vehicles (UAVs). Within the broader field of study of Terrestrial and UAV-based photogrammetry in architectural and cultural heritage contexts, this thesis focuses on four aspects in order to provide an operating methodology for surveys: 1-Influence of number and position of Ground Control Points (GCPs) and tie points in SfM and MVS techniques; 2-Best methods for survey assessment; 3-TLS/SfM-MVS integration; 4-Original applications in architecture surveys. Besides, introduction of UAV-based applications has been investigated in some cases. The thesis provides guidelines for low-cost terrestrial and UAV-based photogrammetry aimed at any figure involved in surveying in architectural and cultural heritage contexts. These guidelines include methodologies for accuracy checks and data integration as well as a workflow enabling survey optimization and devising original applications. Besides, integration of the research aspects has allowed to provide accuracy checks of the acquired data and integration of data from different sources, as well as accuracy controls of both each single-technique model and models obtained through technique integration.
- Research Article
36
- 10.1111/phor.12456
- Aug 13, 2023
- The Photogrammetric Record
Abstract3D reconstruction of scenes using multiple images, relying on robust correspondence search and depth estimation, has been thoroughly studied for the two‐view and multi‐view scenarios in recent years. Multi‐view stereo (MVS) algorithms aim to generate a rich, dense 3D model of the scene in the form of a dense point cloud or a triangulated mesh. In a typical MVS pipeline, the robust estimations for the camera poses along with the sparse points obtained from structure from motion (SfM) are used as input. During this process, the depth of generally every pixel of the scene is to be calculated. Several methods, either conventional or, more recently, learning‐based have been developed for solving the correspondence search problem. A vast amount of research exists in the literature using local, global or semi‐global stereomatching approaches, with the PatchMatch algorithm being among the most popular and efficient conventional ones in the last decade. Yet, and despite the widespread evolution of the algorithms, yielding complete, accurate and aesthetically pleasing 3D representations of a scene remains an open issue in real‐world and large‐scale photogrammetric applications. This work aims to provide a concrete survey on the most widely used MVS methods, investigating underlying concepts and challenges. To this end, the theoretical background and relative literature are discussed for both conventional and learning‐based approaches, with a particular focus on close‐range 3D reconstruction applications.
- Research Article
9
- 10.26833/ijeg.1366146
- Jul 28, 2024
- International Journal of Engineering and Geosciences
Rapid and accurate surveying has always attracted great interest in all scientific and industrial activities that require high-resolution topographic data. The latest automation and advancement in geomatics engineering are remote sensing solutions using Unmanned Aerial Systems (UAS) and Structure from Motion (SfM) with Multi-View Stereo (MVS) photogrammetry. This research aimed to find the influence of flight height, Ground Control Point (GCP), and software on the geometric accuracy of UAS-SfM-derived Digital Surface Models (DSMs) and orthoimages, as well as to analyze and evaluate the accuracy of UAS-SfM as a rapid and low-cost alternative to conventional survey methods. To achieve the aim of the study, aerial surveys using a fixed-wing UAS and field surveys using RTK GNSS and total station were conducted. A total of 16 photogrammetric projects were processed using different GCP configurations, and detailed statistical analysis was performed on the results. Moreover, the contribution of cross flight on bundle adjustment was investigated empirically by conducting a combined photogrammetric image processing. The analysis revealed that flight height, GCP number and distribution, and the processing software significantly affect products' quality and accuracy. Evaluation of the achieved accuracies was made based on the American Society for Photogrammetry and Remote Sensing (ASPRS) positional accuracy standard for digital geospatial data. The findings of this study revealed that using the optimal flight height and GCP configuration, 3D models, orthomosaics and DSMs can be rapidly reconstructed from 2D images with the quality and accuracy sufficient for most terrain analysis applications, including civil engineering projects.
- Research Article
62
- 10.1002/esp.4502
- Oct 9, 2018
- Earth Surface Processes and Landforms
Landslides represent hazardous phenomena, often with significant implications. Monitoring landslides with time‐series surface observations can indicate surface failure. Unmanned aerial vehicles (UAVs) employing compact digital cameras, in conjunction with structure‐from‐motion (SfM) and multi‐view stereo (MVS) image processing approaches, have become commonplace in the geoscience research community. These methods offer relatively low‐cost, flexible solutions for many geomorphological monitoring applications. However, conventionally ground control points (GCPs) are required for registration purposes, the provision of which is often expensive, difficult or even impracticable in hazardous and inaccessible terrain.In an attempt to overcome the reliance on GCPs, this paper reports research that has developed a morphology‐based strategy to co‐register multi‐temporal UAV‐derived products. It applies the attribute of curvature in combination with the scale‐invariant feature transform algorithm, to generate time‐invariant curvature features, which serve as pseudo‐GCPs. Openness, a surface morphological digital elevation model derivative, is applied to identify relatively stable ground regions from which pseudo‐GCPs are selected. A sensitivity threshold quantifies the minimum detectable change alongside unresolved biases and misalignment errors. The approach is evaluated at two study sites in the UK, first at Sandford with artificially induced surface change, and second at an active landslide at Hollin Hill, with multi‐epoch SfM‐MVS products derived from a consumer‐grade UAV. Elevation changes and annual displacement rates at dm‐level are estimated, with optimal results achieved over winter periods. The morphology‐based co‐registration strategy resulted in relative error ratios (i.e. mean error divided by average flying height) in the range 1:800–2500, comparable with those reported by similar studies conducted with UAVs augmented with real time kinematic (RTK)‐Global Navigation Satellite Systems. Analysis demonstrates the potential of the morphology‐based strategy for a semi‐automatic, and practical co‐registration approach to quantify surface motion. This can ultimately complement geotechnical and geophysical investigations and support the understanding of landslide behaviour, model prediction and construction of measures for mitigating risks. © 2018 John Wiley & Sons, Ltd.
- Research Article
3
- 10.1002/cav.1979
- Nov 26, 2020
- Computer Animation and Virtual Worlds
In this paper, we propose a novel Multiview Stereo (MVS) method which can effectively estimate geometry in low‐textured regions. Conventional MVS algorithms predict geometry by performing dense correspondence estimation across multiple views under the constraint of epipolar geometry. As low‐textured regions contain less feature information for reliable matching, estimating geometry for low‐textured regions remains hard work for previous MVS methods. To address this issue, we propose an MVS method based on texture enhancement. By enhancing texture information for each input image via our multiscale bilateral decomposition and reconstruction algorithm, our method can estimate reliable geometry for low‐textured regions that are intractable for previous MVS methods. To densify the final output point cloud, we further propose a novel selective joint bilateral propagation filter, which can effectively propagate reliable geometry estimation to neighboring unpredicted regions. We validate the effectiveness of our method on the ETH3D benchmark. Quantitative and qualitative comparisons demonstrate that our method can significantly improve the quality of reconstruction in low‐textured regions.
- Research Article
18
- 10.1016/j.isprsjprs.2023.11.020
- Nov 28, 2023
- ISPRS Journal of Photogrammetry and Remote Sensing
Edge aware depth inference for large-scale aerial building multi-view stereo
- Research Article
7
- 10.3390/s24082400
- Apr 9, 2024
- Sensors (Basel, Switzerland)
With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching algorithms. However, MVS tasks face noise challenges because of natural multiplicative noise and negative gain in algorithms, which reduce the quality and accuracy of the generated models and depth maps. Traditional MVS methods often struggle with noise, relying on assumptions that do not always hold true under real-world conditions, while deep learning-based MVS approaches tend to suffer from high noise sensitivity. To overcome these challenges, we introduce LNMVSNet, a deep learning network designed to enhance local feature attention and fuse features across different scales, aiming for low-noise, high-precision MVS 3D reconstruction. Through extensive evaluation of multiple benchmark datasets, LNMVSNet has demonstrated its superior performance, showcasing its ability to improve reconstruction accuracy and completeness, especially in the recovery of fine details and clear feature delineation. This advancement brings hope for the widespread application of MVS, ranging from precise industrial part inspection to the creation of immersive virtual environments.