Abstract

Weakly supervised semantic segmentation (WSSS) methods based on image-level labels can relieve the tedious pixel-level annotation burden, and these methods are mainly based on class activation maps (CAMs). However, it is challenging to generate high-quality CAMs for high-resolution remotely sensed imagery (HRSI). In this article, we propose a WSSS method for building extraction from HRSI using image-level labels. The proposed method, termed as the MSG-SR-Net, integrates two novel modules, i.e., multiscale generation (MSG) and superpixel refinement (SR), to obtain high-quality CAMs so as to provide reliable pixel-level training samples for subsequent semantic segmentation steps. The MSG module is proposed to use global semantic information to guide the learning of multiple features across different levels, and then, respectively, to utilize multilevel features for generating multiscale CAMs. This component can effectively suppress the interference of the class-irrelevant noise and strengthen the use of profitable information in multilevel features. The SR module is designed to take advantage of superpixels to improve multiscale CAMs in target integrity and details preserving. Extensive experiments on two public building datasets demonstrated that the proposed modules made the MSG-SR-Net obtain more integral and accurate CAMs for building extraction. Moreover, experimental results also showed the proposed method achieved excellent performance with over 67% in F1-score, and outperformed other weakly supervised methods in effectiveness and generalization ability.

Highlights

  • Building extraction from high-resolution remotely sensed imageries (HRSI) plays a vital role in many important applications, such as population estimation, urbanization evaluation, and urban planning [1]

  • To illustrate the effectiveness of our proposed modules in the multi-scale generation (MSG)-super-pixel refinement (SR)-Net for obtaining class activation maps (CAMs), we carry out ablation experiments on both the WHU building dataset and the InriaAID building dataset

  • Only the MSG is added into the baseline method, and the obtained network is termed as the baseline+MSG method, which is designed to analyze the impact of the MSG

Read more

Summary

INTRODUCTION

Building extraction from high-resolution remotely sensed imageries (HRSI) plays a vital role in many important applications, such as population estimation, urbanization evaluation, and urban planning [1]. Weak annotations can have different forms, in this work, we focus on pixel-level building extraction by adopting only image-level labels, which indicates the existence of object classes in images, and do not provide any information about their locations or boundaries. The WSSS based on image-level labels is very difficult, because it needs to infer the precise spatial information only from object presence in the images To this end, existing works usually rely on class activation maps (CAMs) for obtaining object masks, and make them into pseudo labels to train a semantic segmentation network. There exist much complicated confusion between building objects and background areas in HRSI, which may result in accompanying class-irrelevant noises in low-level features of CNNs (e.g., too much noisy texture).

Building Extraction with CNNs
Weakly Supervised Semantic Segmentation based on Image-Level Labels
PROPOSED METHOD
Multi-scale Generation Module
Super-pixel Refinement Module
Experimental Setting
Performance Evaluation
Methods
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call