Abstract

ABSTRACT With the increased availability of multi-view satellite images, the number of investigations on 3D urban scene reconstruction from multiple satellite images is also increasing. Conventional Multi-View Stereo (MVS) pipelines require the calibrated pose information of the satellite cameras to determine the epipolar geometry and the 3D structure of the stereo correspondences. In this study, we propose a novel Monocular Height estimation and Fusion (MHF) method for 3D reconstruction from uncalibrated multi-view satellite images. By employing a learned monocular depth network, the proposed method first obtains the height map of each satellite image. Second, all height maps obtained from the multi-view images are fused to a refined height map in each image plane. To fuse the height maps, all maps are affine transformed to a virtual reference coordinate system and the transformed maps are then projected to the image plane of each camera coordinate system. The monocular depth network was trained and evaluated on the Data Fusion Contest 2019 (DFC19) dataset including Jacksonville, FL, and Omaha, NE. We also evaluate the ATL-SN4 dataset covering Atlanta, GA to test on untrained new urban scenes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call