Abstract
Abstract. Stereo dense matching is a fundamental task for 3D scene reconstruction. Recently, deep learning based methods have proven effective on some benchmark datasets, for example Middlebury and KITTI stereo. However, it is not easy to find a training dataset for aerial photogrammetry. Generating ground truth data for real scenes is a challenging task. In the photogrammetry community, many evaluation methods use digital surface models (DSM) to generate the ground truth disparity for the stereo pairs, but in this case interpolation may bring errors in the estimated disparity. In this paper, we publish a stereo dense matching dataset based on ISPRS Vaihingen dataset, and use it to evaluate some traditional and deep learning based methods. The evaluation shows that learning-based methods outperform traditional methods significantly when the fine tuning is done on a similar landscape. The benchmark also investigates the impact of the base to height ratio on the performance of the evaluated methods. The dataset can be found in https://github.com/whuwuteng/benchmark_ISPRS2021.
Highlights
Dense matching is a traditional topic in 3D reconstruction, which can be performed in stereo (Scharstein, Szeliski, 2002) or multi-view stereo (MVS) (Jensen et al, 2014)
We focus on stereo dense matching in the specific case of epipolar stereo pairs as most of the recent deep learning approaches are limited to this simple configuration
EVALUATION After generating the epipolar image pairs and the corresponding ground truth disparity images from LiDAR, we evaluate several traditional and learning based methods on this dataset: 1. MICMAC: A variant of Semi global matching (SGM) implemented in MicMac (Pierrot-Deseilligny, Paparoditis, 2006), using normalized cross-correlation (NCC) as (a) Disparity without occluded (b) Disparity with occluded points points filtering shown after nearest filtering shown after nearest interinterpolation
Summary
Dense matching is a traditional topic in 3D reconstruction, which can be performed in stereo (with only two views) (Scharstein, Szeliski, 2002) or multi-view stereo (MVS) (Jensen et al, 2014). We focus on stereo dense matching in the specific case of epipolar stereo pairs (where expected correspondences are on the same lines of the two images) as most of the recent deep learning approaches are limited to this simple configuration. The recent successes of deep learning based dense matching methods in the computer vision community (Laga et al, 2020) raise the question of their applicability in the geospatial context. This paper will investigate this question by comparing traditional and machine learning especially deep learning dense matching techniques on geospatial data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.