General composition method for optical-plate-based LCD multi-view stereo image
Multi-view stereo image composition is largely dependent on the type of multi-view stereo display device. Currently, optical-plate-based multi-view stereo LCD display is most popular, while there is lack of a general composition method for this kind of display. A new general composition method was proposed for the most popular optical-plate-based multi-view stereo LCD display. The method is made up of three parts, i.e. sub-pixel judgment, sub-pixel sub-sampling for each view, and sub-pixel arrangement and composition of each view. This method covers all the possibilities of optical-plate-based multi-view stereo LCD display, with good applicability and popularity. The correctness and validity of the proposed method is verified by experiments.
- Conference Article
1
- 10.1109/3dtv.2009.5069646
- May 1, 2009
Multiview stereo image composition mainly depends on the type of the multiview stereo display device. Currently, multiview LCD optical plate autostereoscopic display device is common in the art, while the composition method is limited. A new general multiview LCD stereo image composition method is proposed in this paper based on the optical plate LCD stereo display device. The proposed method mainly consists of three steps: sub-pixel judgment, sub-sampling of sub-pixel of each view, arrangement and composition of sub-pixels. The proposed method covers all possible cases of the optical plate LCD stereo display device. It has good universality and applicability. The feasibility of the proposed method is verified on the detailed stereo display device.
- Conference Article
15
- 10.1109/cvprw.2018.00065
- Jun 1, 2018
Depth estimation from multi-view stereo images is one of the most fundamental and essential tasks in understanding a scene imaginary. In this paper, we propose a machine learning technique based on deep convolutional neural networks (CNNs) for multi-view stereo matching. The proposed method measures the matching cost to extract depth values between two-view stereo images among multi-view stereo images using a deep architecture. Moreover, we present the confidence estimation network for incorporating the cost volumes along the depth hypothesis in multiview stereo. Experiments show that our estimated depth map from multiple views shows the better performance than the other matching similarity measure on DTU dataset.
- Research Article
2
- 10.5194/isprs-archives-xlviii-1-w2-2023-1075-2023
- Dec 13, 2023
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. In this paper, we propose a method for performing 3D reconstruction by generating virtual RPC parameters from multi-view satellite stereo images provided by Google Earth (GE) software. In the multi-view stereo (MVS) image in a general case, after the pose and parameters of the camera are estimated, a dense 3D surface can be reconstructed. However, in the case of satellite images, it is not easy to obtain the original images with pose parameters of an area of interest. In the case of GE software, which can obtain images across the globe, the images provided are georeferenced and modified to fit the ground control point (GCP), so there is no camera model to explain the projection relationship. Therefore, the purpose of the proposed method is to perform 3D reconstruction by generating virtual camera parameters in modified satellite images obtained from GE software. In the proposed method, satellite images obtained from GE are estimated to be pinhole images using structure from motion (SfM) for initial reconstruction. After initial reconstruction, the 3D model is transformed from a distorted hexahedral space formed along a pixel ray to a UTM coordinate system metric space through a 3D homography-based georeferencing. A virtual rational polynomial camera (RPC) parameter is calculated through the satellite images and the 3D interspace correspondence point of UTM coordinates. The result is generated by virtual RPC and the MVS method using the RPC model. The reconstructed DSM using virtual RPC is improved over the initial reconstruction of the proposed process, and error measurement in the area with GT obtained significant results with an average of 1.366m on an MAE method.
- Conference Article
1
- 10.1109/icdsp.2011.6004994
- Jul 1, 2011
Future stereoscopic (3D) systems will become multiview capable to allow for the user to experience a more realistic 3D experience since they will not be limited to one view. This will help to make 3D technology more realistic, however, viewing discomfort will still be an issue. When viewing stereoscopic images, one cause of viewing discomfort can be attributed to the images appearing unnaturally sharp across the entire range of depth. To correct this problem for multiview images, a 3D filtering approach is proposed that will reduce the computation time required since the filter need only be applied once, whereas conventional 2D filtering techniques would be required to be performed 2n times (where n is the number of views). After conducting an initial experiment on 15 people, the proposed filter (on average) received similar ratings for discomfort and naturalness, when compared to the well established 2D bilateral filters. The benefit of this work is that it can provide an alternative method for filtering multiview images at a low cost, while obtaining similar results to bilateral filters, making it a useful filter for a wide range of future multiview stereo systems/applications.
- Conference Article
57
- 10.1109/iccv.2019.00114
- Oct 1, 2019
Highly accurate 3D volumetric reconstruction is still an open research topic where the main difficulty is usually related to merging some rough estimations with high frequency details. One of the most promising methods is the fusion between multi-view stereo and photometric stereo images. Beside the intrinsic difficulties that multi-view stereo and photometric stereo in order to work reliably, supplementary problems arise when considered together. In this work, we present a volumetric approach to the multi-view photometric stereo problem. The key point of our method is the signed distance field parameterisation and its relation to the surface normal. This is exploited in order to obtain a linear partial differential equation which is solved in a variational framework, that combines multiple images from multiple points of view in a single system. In addition, the volumetric approach is naturally implemented on an octree, which allows for fast ray-tracing that reliably alleviates occlusions and cast shadows. Our approach is evaluated on synthetic and real data-sets and achieves state-of-the-art results.
- Conference Article
2
- 10.1109/icme.2009.5202649
- Jun 1, 2009
We present a 3D object relighting technique for multiview-multi-lighting (MVML) image sets. Our relighting technique is a fusion of multi-view stereo (MVS) technique and image based relighting (IBL) technique. The MVML dataset consists of multiple camera view with each view filmed under multiple time-multiplex illumination modes. A multi-view 3D reconstruction algorithm is first applied using traditional multi-view stereo algorithm. After this, the reconstructed model is relighted through an image based relighting scheme for each camera view, followed with view-independent texture mapping procedure. Interactive relighting results demonstrate our high quality reconstruction accuracy, realistic relighting effects and real-time relighting performance. Moreover, our relighting technique is suitable for dynamic 3D object relighting.
- Research Article
6
- 10.5194/isprs-archives-xlviii-1-w3-2023-123-2023
- Oct 19, 2023
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. 3D reconstruction from single and multi-view stereo images is still an open research topic, despite the high number of solutions proposed in the last decades. The surge of deep learning methods has then stimulated the development of new methods using monocular (MDE, Monocular Depth Estimation), stereoscopic and Multi-View Stereo (MVS) 3D reconstruction, showing promising results, often comparable to or even better than traditional methods. The more recent development of NeRF (Neural Radial Fields) has further triggered the interest for this kind of solution. Most of the proposed approaches, however, focus on terrestrial applications (e.g., autonomous driving or small artefacts 3D reconstructions), while airborne and UAV acquisitions are often overlooked. The recent introduction of new datasets, such as UseGeo has, therefore, given the opportunity to assess how state-of-the-art MDE, MVS and NeRF 3D reconstruction algorithms perform using airborne UAV images, allowing their comparison with LiDAR ground truth. This paper aims to present the results achieved by two MDE, two MVS and two NeRF approaches levering deep learning approaches, trained and tested using the UseGeo dataset. This work allows the comparison with a ground truth showing the current state of the art of these solutions and providing useful indications for their future development and improvement.
- Conference Article
- 10.1109/dcc.1997.582103
- Jan 1, 1997
Summary form only given. Multiview stereo imaging uses arrays of cameras to capture scenes from multiple perspectives. This form of imagery is used in systems that allow the user to survey the scene, for example by head motion. Very little work has been reported on compression schemes for multiview images. Multiview image sets tend to be very large because they may contain several hundred views, but there is considerable redundancy among the views which makes them highly compressible. This paper compares methods for compressing large multiview stereo image sets. There is an obvious similarity between multiview image sets and video sequences. As a baseline we compressed a set of multiview stereo images with JPEG on each image individually and MPEG-1 applied to the whole set. The average bits per pixel were reduced by roughly a factor of two over individual frame compression, at constant mean square error (MSE). Stereo specific perceptual distortions can be viewed in anaglyph representations of the data set. Another method, unique to this data type, is based on residual coding with respect to a synthetic "panoramic still" containing information from all of the images in the set. In this method we synthesize a single panoramic image from all of the members of a registered set, code the panoramic image, and then code the residual images formed by subtracting the individual images from the corresponding position on the panorama. Initial results with this method appear to give a similar MSE rate distortion curve as the MPEG based techniques. However, the panoramic still method is inherently random access.
- Research Article
1
- 10.3390/rs16203863
- Oct 17, 2024
- Remote Sensing
In this paper, we propose a 3D Digital Surface Model (DSM) reconstruction method from uncalibrated Multi-view Satellite Stereo (MVSS) images, where Rational Polynomial Coefficient (RPC) sensor parameters are not available. While recent investigations have introduced several techniques to reconstruct high-precision and high-density DSMs from MVSS images, they inherently depend on the use of geo-corrected RPC sensor parameters. However, RPC parameters from satellite sensors are subject to being erroneous due to inaccurate sensor data. In addition, due to the increasing data availability from the internet, uncalibrated satellite images can be easily obtained without RPC parameters. This study proposes a novel method to reconstruct a 3D DSM from uncalibrated MVSS images by estimating and integrating RPC parameters. To do this, we first employ a structure from motion (SfM) and 3D homography-based geo-referencing method to reconstruct an initial DSM. Second, we sample 3D points from the initial DSM as references and reproject them to the 2D image space to determine 3D–2D correspondences. Using the correspondences, we directly calculate all RPC parameters. To overcome the memory shortage problem while running the large size of satellite images, we also propose an RPC integration method. Image space is partitioned to multiple tiles, and RPC estimation is performed independently in each tile. Then, all tiles’ RPCs are integrated into the final RPC to represent the geometry of the whole image space. Finally, the integrated RPC is used to run a true MVSS pipeline to obtain the 3D DSM. The experimental results show that the proposed method can achieve 1.455 m Mean Absolute Error (MAE) in the height map reconstruction from multi-view satellite benchmark datasets. We also show that the proposed method can be used to reconstruct a geo-referenced 3D DSM from uncalibrated and freely available Google Earth imagery.
- Dissertation
- 10.14711/thesis-991012786067603412
- Jan 1, 2019
Multi-view stereo (MVS) reconstructs 3D representations of the scene from imagery, which is a core problem of computer vision extensively studied for decades. Traditionally, MVS algorithms apply hand-crafted similarity metrics and engineered regularizations to compute dense correspondences. While these methods have shown great results under ideal Lambertian scenarios, classical MVS algorithms still suffer from numerous artifacts. In this thesis, we propose to advance the MVS reconstruction using recent deep learning techniques. First, we present an end-to-end deep learning architecture, MVSNet, for depth map inference from multi-view images. The key contribution of this part is the careful integration between multi-view geometries and convolutional neural networks (CNNs). In the network, we extract deep image features and build the 3D cost volume upon the camera frustum via the differentiable homography warping. Then, 3D convolutions are applied to regularize and regress the output depth map. We demonstrate on DTU dataset that MVSNet significantly outperforms previous state-of-the-arts in both reconstruction completeness and overall quality. Next, we propose to extend the MVSNet architecture for large-scale MVS reconstruction. One major limitation of current learning-based approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes. To this end, we sequentially regularize 2D cost maps via the gated recurrent unit (GRU) rather than regularize the entire 3D cost volume in one go. The GRU regularization dramatically reduces memory consumption and makes high-resolution reconstructions feasible. The proposed R-MVSNet is evaluated on the large-scale Tanks and Temples dataset and achieves comparable results to classical large-scale MVS algorithms. Finally, we establish a large-scale synthetic MVS dataset, BlendedMVS, based on blended images and rendered depth maps. While several MVS datasets have been proposed, they fail to provide accurate depth and occlusion information as ground truth mesh models are usually incomplete. We therefore establish a new MVS dataset based on model rendering. Textured meshes are first reconstructed from images of different scenes, which are then rendered into color images, depth maps and occlusion maps. We further blend rendered images with input images using high-pass and low-pass filters to generate our training input. Extensive experiments demonstrate that models trained on BlendedMVS achieve significant better generalization ability compared with models trained on other MVS datasets. In sum, this thesis presents a complete learning-based solution to large-scale multi-view stereopsis, including a current baseline network (MVSNet), its large-scale extension (R-MVSNet) and a large-scale synthetic dataset (BlendedMVS). We bridge the gap between classical MVS reconstructions and recent deep learning techniques and demonstrate the effectiveness of the learning-based MVS through extensive experiments on different datasets.
- Conference Article
4
- 10.1117/12.2083202
- Mar 17, 2015
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
This paper presents a new multi-view stereo image synthesis using binocular symmetric hole filling. In autostereoscopic displays, multi-view synthesis is needed to provide multiple perspectives of the same scene, as viewed from multiple viewing positions. In the warped image at a distant virtual viewpoint, it is difficult to generate visually plausible multi-view stereo images in multi-view synthesis since very large hole regions (i.e., disoccluded regions) could be induced. Also, binocular asymmetry between the synthesized left-eye and right-eye images is one of the critical factors, which leads to a visual discomfort in stereoscopic viewing. In this paper, we maintain the binocular symmetry using the already filled regions in an adjacent view. The proposed method introduces a binocular symmetric hole filling based on the global optimization for binocular symmetry in the synthesized multi-view stereo images. The experimental results showed that the proposed method outperformed those of the existing methods.
- Research Article
- 10.1080/2150704x.2023.2283901
- Nov 23, 2023
- Remote Sensing Letters
With the increased availability of multi-view satellite images, the number of investigations on 3D urban scene reconstruction from multiple satellite images is also increasing. Conventional Multi-View Stereo (MVS) pipelines require the calibrated pose information of the satellite cameras to determine the epipolar geometry and the 3D structure of the stereo correspondences. In this study, we propose a novel Monocular Height estimation and Fusion (MHF) method for 3D reconstruction from uncalibrated multi-view satellite images. By employing a learned monocular depth network, the proposed method first obtains the height map of each satellite image. Second, all height maps obtained from the multi-view images are fused to a refined height map in each image plane. To fuse the height maps, all maps are affine transformed to a virtual reference coordinate system and the transformed maps are then projected to the image plane of each camera coordinate system. The monocular depth network was trained and evaluated on the Data Fusion Contest 2019 (DFC19) dataset including Jacksonville, FL, and Omaha, NE. We also evaluate the ATL-SN4 dataset covering Atlanta, GA to test on untrained new urban scenes.
- Research Article
24
- 10.1109/tgrs.2022.3183567
- Jan 1, 2022
- IEEE Transactions on Geoscience and Remote Sensing
We present a novel 3D instance segmentation framework for Multi-View Stereo (MVS) buildings in urban scenes. Unlike existing works focusing on semantic segmentation of urban scenes, the emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model. Multi-view RGB images are first enhanced to RGBH images by adding a heightmap and are segmented to obtain all roof instances using a fine-tuned 2D instance segmentation neural network. Instance masks from different multi-view images are then clustered into global masks. Our mask clustering accounts for spatial occlusion and overlapping, which can eliminate segmentation ambiguities among multi-view images. Based on these global masks, 3D roof instances are segmented out by mask back-projections and extended to the entire building instances through a Markov random field optimization. A new dataset that contains instance-level annotation for both 3D urban scenes (roofs and buildings) and drone images (roofs) is provided. To the best of our knowledge, it is the first outdoor dataset dedicated to 3D instance segmentation with much more annotations of attached 3D buildings than existing datasets. Quantitative evaluations and ablation studies have shown the effectiveness of all major steps and the advantages of our multi-view framework over the orthophoto-based method.
- Research Article
3
- 10.5194/isprs-archives-xliii-b2-2022-153-2022
- May 30, 2022
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. A dense point cloud with rich and realistic texture is generated from multiview images using dense reconstruction algorithms such as Multi View Stereo (MVS). However, its spatial precision depends on the performance of the matching and dense reconstruction algorithms used. Moreover, outliers are usually unavoidable as mismatching of image features. The lidar point cloud lacks texture but performs better spatial precision because it avoids computational errors. This paper proposes a multiresolution patch-based 3D dense reconstruction method based on integrating multiview images and the laser point cloud. A sparse point cloud is firstly generated with multiview images by Structure from Motion (SfM), and then registered with the laser point cloud to establish the mapping relationship between the laser point cloud and multiview images. The laser point cloud is reprojected to multiview images. The corresponding optimal level of the image pyramid is predicted by the distance distribution of projected pixels, which is used as the starting level for patch optimization during dense reconstruction. The laser point cloud is used as stable seed points for patch growth and expansion, and stored by the dynamic octree structure. Subsequently, the corresponding patches are optimized and expanded with the pyramid image to achieve multiscale and multiresolution dense reconstruction. In addition, the octree’s spatial index structure facilitates parallel computing with highly efficiency. The experimental results show that the proposed method is superior to the traditional MVS technology in terms of model accuracy and completeness, and have broad application prospects in high-precision 3D modeling of large scenes.
- Research Article
8
- 10.1007/s11554-017-0745-9
- Feb 15, 2018
- Journal of Real-Time Image Processing
PatchMatch multi-view stereo (MVS) is one method generating depth maps from multi-view images and is expected to be used for various applications such as robot vision, 3D measurement, and 3D reconstruction. The major drawback of PatchMatch MVS is its large computational amount, and its acceleration is strongly desired. However, this acceleration is prevented by two problems. First, though PatchMatch MVS estimates depth maps by propagating estimation results among neighbor pixels, it is not suitable for GPU-based acceleration. Second, since the shape of a matching window used for stereo matching is changed dynamically, reading its pixels is inefficient in memory access. This paper proposes an FPGA accelerator exploiting on-chip FIFOs efficiently to solve the propagation problem. Moreover, reading pixels of a matching window is improved by a cover window which has the fixed shape and covers the matching window. The FPGA accelerator is designed using a design tool based on Open Computing Language (OpenCL). Although parameters of PatchMatch MVS depend on object images, these parameters can be changed easily by the OpenCL-based design. The experimental results demonstrate that the FPGA implementation achieves 3.4 and 2.2 times faster processing speeds than the CPU and GPU ones, respectively, and the power-delay product of the FPGA implementation is 3.2 and 5.7% of the CPU and GPU ones, respectively.