Abstract

The recent applications of fully convolutional networks (FCNs) have shown to improve the semantic segmentation of very high resolution (VHR) remote-sensing images because of the excellent feature representation and end-to-end pixel labeling capabilities. While many FCN-based methods concatenate features from multilevel encoding stages to refine the coarse labeling results, the semantic gap between features of different levels and the selection of representative features are often overlooked, leading to the generation of redundant information and unexpected classification results. In this article, we propose an attention-guided label refinement network (ALRNet) for improved semantic labeling of VHR images. ALRNet follows the paradigm of the encoder–decoder architecture, which progressively refines the coarse labeling maps of different scales by using the channelwise attention mechanism. A novel attention-guided feature fusion module based on the squeeze-and-excitation module is designed to fuse higher level and lower level features. In this way, the semantic gaps among features of different levels are declined, and the category discrimination of each pixel in the lower level features is strengthened, which is helpful for subsequent label refinement. ALRNet is tested on three public datasets, including two ISRPS 2-D labeling datasets and the Wuhan University aerial building dataset. Results demonstrated that ALRNet had shown promising segmentation performance in comparison with state-of-the-art deep learning networks. The source code of ALRNet is made publicly available for further studies.

Highlights

  • T HE development of remote-sensing technologies for the earth observation has significantly increased the accessibility to very high spatial resolution (VHR) images [1], which opens up new horizons for a better understanding of our changing world

  • While tests of attention-guided label refinement network (ALRNet) on the international society for photogrammetry and remote-sensing (ISPRS) benchmarks have demonstrated that its ability to deal with the multiclass classification problem in VHR images, we further validated ALRNet on the Wuhan University (WHU) building dataset for recognizing the building objects from images of different spatial resolution and band composition

  • We presented an ALRNet for VHR image segmentation

Read more

Summary

Introduction

T HE development of remote-sensing technologies for the earth observation has significantly increased the accessibility to very high spatial resolution (VHR) images [1], which opens up new horizons for a better understanding of our changing world. Semantic segmentation that assigns a semantic label to each pixel in an image is one of the fundamental approaches to analyze remote-sensing data [2], and plays an essential role in diverse applications, such as land cover/land use interpretation [3], disaster analysis, urban planning [4], and environment monitoring. Many efforts have been made in the past few decades to develop accurate semantic segmentation methods, including machine-learning-based methods [6], [7] and object-based analysis methods [8]. Accurate semantic labeling of VHR images is challenging for reasons. The high intraclass and low interclass spectral variation of complicated urban areas in the VHR images make it difficult to extract representative features of target objects [9]. Many methods depend on designing hand-crafted features [10], whereas the hand-crafted features are usually low/mid-level features and are often unreliable to distinguish objects in complicated circumstances [11]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call