Abstract. The availability of semantic information about a cityscape is essential for understanding and analysing urban processes. Automatic gathering of such information is important due to the enormous amount of data. A great number of building features could be gained solely by visual inspections. Therefore, it is meaningful to utilize recent advancements in automatic image recognition technologies to extract these properties automatically.This paper proposes an optimized solution for the classification of rooftops from aerial imagery based on a deep learning model using Convolutional Neural Networks (CNNs). It describes the architecture of the network, the training procedure and important hypermeters. A model analysis using advanced interpretability and explainability tools is conducted. The model’s superiority is demonstrated by comparing its performance against several state-of-the-art image classification architectures, including CNN-based ones such as Xception and Efficientnet, pure Visual Transformers (ViTs) based architectures such as BEiT, and hybrid architectures.
Read full abstract