CNNs for remote extraction of urban features: A survey-driven benchmarking

Bipul Neupane,Jagannath Aryal,Abbas Rajabifard

doi:10.1016/j.eswa.2024.124751

Abstract

Accurate extraction of urban features such as buildings and roads lays the foundation for the current trends of digital twins of urban systems to support planning, monitoring, navigation, and decision processes. The process of such extraction involves training convolutional neural networks (CNNs) on high-resolution earth observation (EO) images. The spatial resolution of images has increased to a centimetre level and the CNNs are fast evolving in computer vision. The last 10 years of this development have resulted in both high-performance and computationally efficient CNNs, but they are merely benchmarked under a uniform setting. We present a survey-driven benchmark of CNNs starting with a systematic survey of 165 research articles to understand the state-of-the-art of urban feature extraction. The survey looks for the most prominent urban feature, EO source, benchmark dataset, CNN-based deep learning configuration, and hyperparameters. Further, more CNNs are searched in the computer vision domain. Identified from the survey and search, 65 CNNs are trained and evaluated in an encoder–decoder configuration using a benchmark dataset under uniform settings. Extensive hyperparameter tuning of the best-performing CNN is performed with six optimisers and nine loss functions. The tuned CNN is then tested as an encoder in other state-of-the-art encoder–decoder networks. The CNNs and network configurations with the highest scores are further benchmarked on the Massachusetts Building and WHU Building datasets. The findings from this survey-driven benchmark of CNNs will be useful for both academia and industry involved in the science of earth observation and computer vision.

Full Text