Abstract

Abstract. Fast and efficient detection and reconstruction of buildings have become essential in real-time applications such as navigation, 3D rendering, augmented reality, and 3D smart cities. In this study, a modern Deep Learning (DL)-based framework is proposed for automatic detection, localization, and height estimation of buildings, simultaneously, from a single aerial image. The proposed framework is based on a Y-shaped Convolutional Neural Network (Y-Net) which includes one encoder and two decoders. The input of the network is a single RGB image, while the outputs are predicted height information of buildings as well as the rooflines in three classes of eave, ridge, and hip lines. The extracted knowledge by the Y-Net (i.e. buildings’ heights and rooflines) is utilized for 3D reconstruction of buildings based on the third Level of Detail (LoD2). The main steps of the proposed approach are data preparation, CNNs training, and 3D reconstruction. For the experimental investigations airborne data from Potsdam are used, which were provided by ISPRS. For the predicted heights, the results show an average Root Mean Square Error (RMSE) and a Normalized Median Absolute Deviation (NMAD) of about 3.8 m and 1.3 m, respectively. Moreover, the overall accuracy of the extracted rooflines is about 86%.

Highlights

  • Buildings are the most prominent objects in urban scenes, measuring and analyzing 3D shapes and positions of buildings are essential for many applications such as 3D map updating, urban management, smart cities, monitoring, navigation and mapping, civil infrastructure inspection, and scene understanding

  • To evaluate the performance of the proposed approach, an airborne dataset from Potsdam, Germany, provided by ISPRS (ISPRS, 2018), is used which consists of very high-resolution true orthophoto tiles with a ground sampling distance (GSD) of 5 cm and corresponding Digital Surface Models (DSMs) derived from dense image matching techniques

  • The training dataset includes 4,800 tiles of RGB images, nDSMs, and rooflines which are increased to 24,000 tiles with a size of 224×224 after data augmentation

Read more

Summary

Introduction

Buildings are the most prominent objects in urban scenes, measuring and analyzing 3D shapes and positions of buildings are essential for many applications such as 3D map updating, urban management, smart cities, monitoring, navigation and mapping, civil infrastructure inspection, and scene understanding. The remotely sensed data such as stereo aerial and satellite images or LiDAR data are the main sources to extract 3D information of urban objects using photogrammetry techniques These data sources are not available everywhere and generation of updated Digital Surface Models (DSMs) needs a considerable amount of effort, time, and cost, especially for large areas. Sometimes, it is not possible to capture images from different views to reconstruct 3D models because of obstacles and occluded areas or the limited acquisition time To address this issue, many investigations are attempting to reconstruct 3D scenes from monocular images such as single satellite and aerial images as a low-cost solution for rapid 3D mapping and fast 3D visualization and rendering of urban scenes. The double-blind peer-review was conducted on the basis of the full paper

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.