Abstract

Estimation of the Digital Surface Model (DSM) and building heights from single-view aerial imagery is a challenging inherently ill-posed problem that we address in this paper by resorting to machine learning. We propose an end-to-end trainable convolutional-deconvolutional deep neural network architecture that enables learning mapping from a single aerial imagery to a DSM for analysis of urban scenes. We perform multisensor fusion of aerial optical and aerial light detection and ranging (Lidar) data to prepare the training data for our pipeline. The dataset quality is key to successful estimation performance. Typically, a substantial amount of misregistration artifacts are present due to georeferencing/projection errors, sensor calibration inaccuracies, and scene changes between acquisitions. To overcome these issues, we propose a registration procedure to improve Lidar and optical data alignment that relies on Mutual Information, followed by Hough transform-based validation step to adjust misregistered image patches. We validate our building height estimation model on a high-resolution dataset captured over central Dublin, Ireland: Lidar point cloud of 2015 and optical aerial images from 2017. These data allow us to validate the proposed registration procedure and perform 3D model reconstruction from single-view aerial imagery. We also report state-of-the-art performance of our proposed architecture on several popular DSM estimation datasets.

Highlights

  • High-resolution orthorectified imagery acquired by aerial or satellite sensors is well known to be a rich source of information with high geolocation accuracy

  • We report a brief review of the registration techniques applied to point clouds and aerial imagery (Section 2.5) as this is inevitably the first step in dealing with incoming new data, see, e.g., Figure 2

  • Of the original 2366 patches, 1999 patches are used for training, and 367 patches are used for testing and for comparisons between several preprocessing pipeline scenarios: no registration, registration (MI), and registration with invalid patch adjustment

Read more

Summary

Introduction

High-resolution orthorectified imagery acquired by aerial or satellite sensors is well known to be a rich source of information with high geolocation accuracy These images are widely used in geographic information systems (GIS), for instance, for detection of man-made objects (building), urban monitoring, and planning. Stereo image pairs [1], structure from motion (SfM) [2], or Light Detection and Ranging (Lidar) laser-scanning technology are traditionally used to obtain point clouds. These methods provide 3D information with various levels of accuracy which can be converted to Digital Surface Model (DSM) which in turn can be stored as grayscale imagery. The height information is extracted via triangulation from pairs of consecutive views, and the single view imagery can not be used by these techniques

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.