Investigating the use of deep learning models for land cover classification from street‐level imagery

Narumasa Tsutsumida,Jing Zhao,Naho Shibuya,Kenlo Nasahara,Takeo Tadono

doi:10.1111/1440-1703.12470

Abstract

AbstractLand cover classification mapping is the process of assigning labels to different types of land surfaces based on overhead imagery. However, acquiring reference samples through fieldwork for ground truth can be costly and time‐intensive. Additionally, annotating high‐resolution satellite images poses challenges, as certain land cover types are difficult to discern solely from nadir images. To address these challenges, this study examined the feasibility of using street‐level imagery to support the collection of reference samples and identify land cover. We utilized 18,022 images captured in Japan, with 14 different land cover classes. Our approach involved using convolutional neural networks based on Inception‐v4 and DenseNet, as well as Transformer‐based Vision and Swin Transformers, both with and without pre‐trained weights and fine‐tuning techniques. Additionally, we explored explainability through Gradient‐Weighted Class Activation Mapping (Grad‐CAM). Our results indicate that using a Vision Transformer was the most effective method, achieving an overall accuracy of 86.12% and allowing for full explainability of land cover targets within an image. This paper proposes a promising solution for land cover classification from street‐level imagery, which can be used for semi‐automatic reference sample collection from geo‐tagged street‐level photos.

Full Text