A general multiview LCD stereo image composition method based on optical plate technology
Multiview stereo image composition mainly depends on the type of the multiview stereo display device. Currently, multiview LCD optical plate autostereoscopic display device is common in the art, while the composition method is limited. A new general multiview LCD stereo image composition method is proposed in this paper based on the optical plate LCD stereo display device. The proposed method mainly consists of three steps: sub-pixel judgment, sub-sampling of sub-pixel of each view, arrangement and composition of sub-pixels. The proposed method covers all possible cases of the optical plate LCD stereo display device. It has good universality and applicability. The feasibility of the proposed method is verified on the detailed stereo display device.
- Research Article
2
- 10.3724/sp.j.1087.2008.00195
- Jun 30, 2008
- Journal of Computer Applications
Multi-view stereo image composition is largely dependent on the type of multi-view stereo display device. Currently, optical-plate-based multi-view stereo LCD display is most popular, while there is lack of a general composition method for this kind of display. A new general composition method was proposed for the most popular optical-plate-based multi-view stereo LCD display. The method is made up of three parts, i.e. sub-pixel judgment, sub-pixel sub-sampling for each view, and sub-pixel arrangement and composition of each view. This method covers all the possibilities of optical-plate-based multi-view stereo LCD display, with good applicability and popularity. The correctness and validity of the proposed method is verified by experiments.
- Conference Article
57
- 10.1109/iccv.2019.00114
- Oct 1, 2019
Highly accurate 3D volumetric reconstruction is still an open research topic where the main difficulty is usually related to merging some rough estimations with high frequency details. One of the most promising methods is the fusion between multi-view stereo and photometric stereo images. Beside the intrinsic difficulties that multi-view stereo and photometric stereo in order to work reliably, supplementary problems arise when considered together. In this work, we present a volumetric approach to the multi-view photometric stereo problem. The key point of our method is the signed distance field parameterisation and its relation to the surface normal. This is exploited in order to obtain a linear partial differential equation which is solved in a variational framework, that combines multiple images from multiple points of view in a single system. In addition, the volumetric approach is naturally implemented on an octree, which allows for fast ray-tracing that reliably alleviates occlusions and cast shadows. Our approach is evaluated on synthetic and real data-sets and achieves state-of-the-art results.
- Research Article
2
- 10.5194/isprs-archives-xlviii-1-w2-2023-1075-2023
- Dec 13, 2023
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. In this paper, we propose a method for performing 3D reconstruction by generating virtual RPC parameters from multi-view satellite stereo images provided by Google Earth (GE) software. In the multi-view stereo (MVS) image in a general case, after the pose and parameters of the camera are estimated, a dense 3D surface can be reconstructed. However, in the case of satellite images, it is not easy to obtain the original images with pose parameters of an area of interest. In the case of GE software, which can obtain images across the globe, the images provided are georeferenced and modified to fit the ground control point (GCP), so there is no camera model to explain the projection relationship. Therefore, the purpose of the proposed method is to perform 3D reconstruction by generating virtual camera parameters in modified satellite images obtained from GE software. In the proposed method, satellite images obtained from GE are estimated to be pinhole images using structure from motion (SfM) for initial reconstruction. After initial reconstruction, the 3D model is transformed from a distorted hexahedral space formed along a pixel ray to a UTM coordinate system metric space through a 3D homography-based georeferencing. A virtual rational polynomial camera (RPC) parameter is calculated through the satellite images and the 3D interspace correspondence point of UTM coordinates. The result is generated by virtual RPC and the MVS method using the RPC model. The reconstructed DSM using virtual RPC is improved over the initial reconstruction of the proposed process, and error measurement in the area with GT obtained significant results with an average of 1.366m on an MAE method.
- Conference Article
15
- 10.1109/cvprw.2018.00065
- Jun 1, 2018
Depth estimation from multi-view stereo images is one of the most fundamental and essential tasks in understanding a scene imaginary. In this paper, we propose a machine learning technique based on deep convolutional neural networks (CNNs) for multi-view stereo matching. The proposed method measures the matching cost to extract depth values between two-view stereo images among multi-view stereo images using a deep architecture. Moreover, we present the confidence estimation network for incorporating the cost volumes along the depth hypothesis in multiview stereo. Experiments show that our estimated depth map from multiple views shows the better performance than the other matching similarity measure on DTU dataset.
- Conference Article
1
- 10.1109/icdsp.2011.6004994
- Jul 1, 2011
Future stereoscopic (3D) systems will become multiview capable to allow for the user to experience a more realistic 3D experience since they will not be limited to one view. This will help to make 3D technology more realistic, however, viewing discomfort will still be an issue. When viewing stereoscopic images, one cause of viewing discomfort can be attributed to the images appearing unnaturally sharp across the entire range of depth. To correct this problem for multiview images, a 3D filtering approach is proposed that will reduce the computation time required since the filter need only be applied once, whereas conventional 2D filtering techniques would be required to be performed 2n times (where n is the number of views). After conducting an initial experiment on 15 people, the proposed filter (on average) received similar ratings for discomfort and naturalness, when compared to the well established 2D bilateral filters. The benefit of this work is that it can provide an alternative method for filtering multiview images at a low cost, while obtaining similar results to bilateral filters, making it a useful filter for a wide range of future multiview stereo systems/applications.
- Research Article
6
- 10.5194/isprs-archives-xlviii-1-w3-2023-123-2023
- Oct 19, 2023
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. 3D reconstruction from single and multi-view stereo images is still an open research topic, despite the high number of solutions proposed in the last decades. The surge of deep learning methods has then stimulated the development of new methods using monocular (MDE, Monocular Depth Estimation), stereoscopic and Multi-View Stereo (MVS) 3D reconstruction, showing promising results, often comparable to or even better than traditional methods. The more recent development of NeRF (Neural Radial Fields) has further triggered the interest for this kind of solution. Most of the proposed approaches, however, focus on terrestrial applications (e.g., autonomous driving or small artefacts 3D reconstructions), while airborne and UAV acquisitions are often overlooked. The recent introduction of new datasets, such as UseGeo has, therefore, given the opportunity to assess how state-of-the-art MDE, MVS and NeRF 3D reconstruction algorithms perform using airborne UAV images, allowing their comparison with LiDAR ground truth. This paper aims to present the results achieved by two MDE, two MVS and two NeRF approaches levering deep learning approaches, trained and tested using the UseGeo dataset. This work allows the comparison with a ground truth showing the current state of the art of these solutions and providing useful indications for their future development and improvement.
- Dissertation
- 10.14711/thesis-991012786067603412
- Jan 1, 2019
Multi-view stereo (MVS) reconstructs 3D representations of the scene from imagery, which is a core problem of computer vision extensively studied for decades. Traditionally, MVS algorithms apply hand-crafted similarity metrics and engineered regularizations to compute dense correspondences. While these methods have shown great results under ideal Lambertian scenarios, classical MVS algorithms still suffer from numerous artifacts. In this thesis, we propose to advance the MVS reconstruction using recent deep learning techniques. First, we present an end-to-end deep learning architecture, MVSNet, for depth map inference from multi-view images. The key contribution of this part is the careful integration between multi-view geometries and convolutional neural networks (CNNs). In the network, we extract deep image features and build the 3D cost volume upon the camera frustum via the differentiable homography warping. Then, 3D convolutions are applied to regularize and regress the output depth map. We demonstrate on DTU dataset that MVSNet significantly outperforms previous state-of-the-arts in both reconstruction completeness and overall quality. Next, we propose to extend the MVSNet architecture for large-scale MVS reconstruction. One major limitation of current learning-based approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes. To this end, we sequentially regularize 2D cost maps via the gated recurrent unit (GRU) rather than regularize the entire 3D cost volume in one go. The GRU regularization dramatically reduces memory consumption and makes high-resolution reconstructions feasible. The proposed R-MVSNet is evaluated on the large-scale Tanks and Temples dataset and achieves comparable results to classical large-scale MVS algorithms. Finally, we establish a large-scale synthetic MVS dataset, BlendedMVS, based on blended images and rendered depth maps. While several MVS datasets have been proposed, they fail to provide accurate depth and occlusion information as ground truth mesh models are usually incomplete. We therefore establish a new MVS dataset based on model rendering. Textured meshes are first reconstructed from images of different scenes, which are then rendered into color images, depth maps and occlusion maps. We further blend rendered images with input images using high-pass and low-pass filters to generate our training input. Extensive experiments demonstrate that models trained on BlendedMVS achieve significant better generalization ability compared with models trained on other MVS datasets. In sum, this thesis presents a complete learning-based solution to large-scale multi-view stereopsis, including a current baseline network (MVSNet), its large-scale extension (R-MVSNet) and a large-scale synthetic dataset (BlendedMVS). We bridge the gap between classical MVS reconstructions and recent deep learning techniques and demonstrate the effectiveness of the learning-based MVS through extensive experiments on different datasets.
- Conference Article
2
- 10.1109/icme.2009.5202649
- Jun 1, 2009
We present a 3D object relighting technique for multiview-multi-lighting (MVML) image sets. Our relighting technique is a fusion of multi-view stereo (MVS) technique and image based relighting (IBL) technique. The MVML dataset consists of multiple camera view with each view filmed under multiple time-multiplex illumination modes. A multi-view 3D reconstruction algorithm is first applied using traditional multi-view stereo algorithm. After this, the reconstructed model is relighted through an image based relighting scheme for each camera view, followed with view-independent texture mapping procedure. Interactive relighting results demonstrate our high quality reconstruction accuracy, realistic relighting effects and real-time relighting performance. Moreover, our relighting technique is suitable for dynamic 3D object relighting.
- Research Article
1
- 10.3390/rs16203863
- Oct 17, 2024
- Remote Sensing
In this paper, we propose a 3D Digital Surface Model (DSM) reconstruction method from uncalibrated Multi-view Satellite Stereo (MVSS) images, where Rational Polynomial Coefficient (RPC) sensor parameters are not available. While recent investigations have introduced several techniques to reconstruct high-precision and high-density DSMs from MVSS images, they inherently depend on the use of geo-corrected RPC sensor parameters. However, RPC parameters from satellite sensors are subject to being erroneous due to inaccurate sensor data. In addition, due to the increasing data availability from the internet, uncalibrated satellite images can be easily obtained without RPC parameters. This study proposes a novel method to reconstruct a 3D DSM from uncalibrated MVSS images by estimating and integrating RPC parameters. To do this, we first employ a structure from motion (SfM) and 3D homography-based geo-referencing method to reconstruct an initial DSM. Second, we sample 3D points from the initial DSM as references and reproject them to the 2D image space to determine 3D–2D correspondences. Using the correspondences, we directly calculate all RPC parameters. To overcome the memory shortage problem while running the large size of satellite images, we also propose an RPC integration method. Image space is partitioned to multiple tiles, and RPC estimation is performed independently in each tile. Then, all tiles’ RPCs are integrated into the final RPC to represent the geometry of the whole image space. Finally, the integrated RPC is used to run a true MVSS pipeline to obtain the 3D DSM. The experimental results show that the proposed method can achieve 1.455 m Mean Absolute Error (MAE) in the height map reconstruction from multi-view satellite benchmark datasets. We also show that the proposed method can be used to reconstruct a geo-referenced 3D DSM from uncalibrated and freely available Google Earth imagery.
- Conference Article
1
- 10.1109/icvrv.2013.34
- Sep 1, 2013
In this paper, we propose a novel depth recovery method from multi-view stereo based focusing. Inspired by the 4D light field theory, we discover the relationship between classical multi-view stereo (MVS) and depth from focus (DFF) methods and concern about different frequency distribution in 2D light field space. Then we propose a way to separate the depth recovery into two steps. At the first stage, we choose some depth candidates using existing multi-view stereo method. At the second phase, the depth from focusing algorithm is employed to determine the final depth. As well known, multi-view stereo and depth from focus need different kinds of input images, which can not be acquired at the same time by using traditional imaging system. We have addressed this issue by using a camera array system and synthetic aperture photography. Both multi-view images and distinct defocus blur images can be captured at the same time. Experimental results have shown that our proposed method can take advantages of MVS and DFF and the recovered depth is better than traditional methods.
- Conference Article
4
- 10.1117/12.2083202
- Mar 17, 2015
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
This paper presents a new multi-view stereo image synthesis using binocular symmetric hole filling. In autostereoscopic displays, multi-view synthesis is needed to provide multiple perspectives of the same scene, as viewed from multiple viewing positions. In the warped image at a distant virtual viewpoint, it is difficult to generate visually plausible multi-view stereo images in multi-view synthesis since very large hole regions (i.e., disoccluded regions) could be induced. Also, binocular asymmetry between the synthesized left-eye and right-eye images is one of the critical factors, which leads to a visual discomfort in stereoscopic viewing. In this paper, we maintain the binocular symmetry using the already filled regions in an adjacent view. The proposed method introduces a binocular symmetric hole filling based on the global optimization for binocular symmetry in the synthesized multi-view stereo images. The experimental results showed that the proposed method outperformed those of the existing methods.
- Research Article
2
- 10.1109/access.2020.3004431
- Jan 1, 2020
- IEEE Access
Image-based rendering (IBR) attempts to synthesize novel views using a set of observed images. Some IBR approaches (such as light fields) have yielded impressive high-quality results on small-scale scenes with dense photo capture. However, available wide-baseline IBR methods are still restricted by the low geometric accuracy and completeness of multi-view stereo (MVS) reconstruction on low-textured and non-Lambertian surfaces. The issues become more significant in large-scale outdoor scenes due to challenging scene content, e.g., buildings, trees, and sky. To address these problems, we present a novel IBR algorithm that consists of two key components. First, we propose a novel depth refinement method that combines MVS depth maps with monocular depth maps predicted via deep learning. A lookup table remap is proposed for converting the scale of the monocular depths to be consistent with the scale of the MVS depths. Then, the rescaled monocular depth is used as the constraint in the minimum spanning tree (MST)-based nonlocal filter to refine the per-view MVS depth. Second, we present an efficient shape-preserving warping algorithm that uses superpixels to generate the warped images and blend expected novel views of scenes. The proposed method has been evaluated on public MVS and view synthesis datasets, as well as newly captured large-scale outdoor datasets. In comparison with state-of-the-art methods, the experimental results demonstrated that the proposed method can obtain more complete and reliable depth maps for the challenging large-scale outdoor scenes, thereby resulting in more promising novel view synthesis.
- Research Article
13
- 10.1587/transinf.2014edp7409
- Jan 1, 2015
- IEICE Transactions on Information and Systems
SUMMARY Methods of window matching to estimate 3D points are the most serious factors affecting the accuracy, robustness, and computational cost of Multi-View Stereo (MVS) algorithms. Most existing MVS algorithms employ window matching based on Normalized CrossCorrelation (NCC) to estimate the depth of a 3D point. NCC-based window matching estimates the displacement between matching windows with sub-pixel accuracy by linear/ cubic interpolation, which does not represent accurate sub-pixel values of matching windows. This paper proposes a technique of window matching that is very accurate using Phase-Only Correlation (POC) with geometric correction for MVS. The accurate sub-pixel displacement between two matching windows can be estimated by fitting the analytical correlation peak model of the POC function. The proposed method also corrects the geometric transformations of matching windows by taking into consideration the 3D shape of a target object. The use of the proposed geometric correction approach makes it possible to achieve accurate 3D reconstruction from multi-view images even for images with large transformations. The proposed method demonstrates more accurate 3D reconstruction from multi-view images than the conventional methods
- Research Article
- 10.1080/2150704x.2023.2283901
- Nov 23, 2023
- Remote Sensing Letters
With the increased availability of multi-view satellite images, the number of investigations on 3D urban scene reconstruction from multiple satellite images is also increasing. Conventional Multi-View Stereo (MVS) pipelines require the calibrated pose information of the satellite cameras to determine the epipolar geometry and the 3D structure of the stereo correspondences. In this study, we propose a novel Monocular Height estimation and Fusion (MHF) method for 3D reconstruction from uncalibrated multi-view satellite images. By employing a learned monocular depth network, the proposed method first obtains the height map of each satellite image. Second, all height maps obtained from the multi-view images are fused to a refined height map in each image plane. To fuse the height maps, all maps are affine transformed to a virtual reference coordinate system and the transformed maps are then projected to the image plane of each camera coordinate system. The monocular depth network was trained and evaluated on the Data Fusion Contest 2019 (DFC19) dataset including Jacksonville, FL, and Omaha, NE. We also evaluate the ATL-SN4 dataset covering Atlanta, GA to test on untrained new urban scenes.
- Conference Article
- 10.1109/dcc.1997.582103
- Jan 1, 1997
Summary form only given. Multiview stereo imaging uses arrays of cameras to capture scenes from multiple perspectives. This form of imagery is used in systems that allow the user to survey the scene, for example by head motion. Very little work has been reported on compression schemes for multiview images. Multiview image sets tend to be very large because they may contain several hundred views, but there is considerable redundancy among the views which makes them highly compressible. This paper compares methods for compressing large multiview stereo image sets. There is an obvious similarity between multiview image sets and video sequences. As a baseline we compressed a set of multiview stereo images with JPEG on each image individually and MPEG-1 applied to the whole set. The average bits per pixel were reduced by roughly a factor of two over individual frame compression, at constant mean square error (MSE). Stereo specific perceptual distortions can be viewed in anaglyph representations of the data set. Another method, unique to this data type, is based on residual coding with respect to a synthetic "panoramic still" containing information from all of the images in the set. In this method we synthesize a single panoramic image from all of the members of a registered set, code the panoramic image, and then code the residual images formed by subtracting the individual images from the corresponding position on the panorama. Initial results with this method appear to give a similar MSE rate distortion curve as the MPEG based techniques. However, the panoramic still method is inherently random access.