Abstract

It is essential for researchers to have a proper interpretation of remote sensing images (RSIs) and precise semantic labeling of their component parts. Although FCN (Fully Convolutional Networks)-like deep convolutional network architectures have been widely applied in the perception of autonomous cars, there are still two challenges in the semantic segmentation of RSIs. The first is to identify details in high-resolution images with complex scenes and to solve the class-mismatch issues; the second is to capture the edge of objects finely without being confused by the surroundings. HRNET has the characteristics of maintaining high-resolution representation by fusing feature information with parallel multi-resolution convolution branches. We adopt HRNET as a backbone and propose to incorporate the Class-Oriented Region Attention Module (CRAM) and Class-Oriented Context Fusion Module (CCFM) to analyze the relationships between classes and patch regions and between classes and local or global pixels, respectively. Thus, the perception capability of the model for the detailed part in the aerial image can be enhanced. We leverage these modules to develop an end-to-end semantic segmentation model for aerial images and validate it on the ISPRS Potsdam and Vaihingen datasets. The experimental results show that our model improves the baseline accuracy and outperforms some commonly used CNN architectures.

Highlights

  • In the domain of remote sensing, a key aspect is for researchers to understand images correctly

  • In order to tackle both challenges, we developed and applied Class-Oriented Region Attention Module (CRAM) and Class-Oriented Context Fusion Module (CCFM) based on HRNET, which links convolutional blocks of different resolutions in parallel and enables them to communicate with each other, so that the network with the fusing information can retain robust, high-resolution representations during the feature extraction process

  • We introduce CCFM that leverages an attentional mechanism to better interpret the relationship between classes and specific pixels, facilitating the acquisition of semantic information from long dependencies and providing a multi-scale contextual representation for the semantic segmentation task in aerial images, allowing for a detailed identification of the outlines

Read more

Summary

Introduction

In the domain of remote sensing, a key aspect is for researchers to understand images correctly. Using the approach of semantic segmentation to better grasp the semantic information in images can assist researchers in making breakthroughs in the following areas: keeping track of changes in buildings [1,2,3], extracting information about road networks [4,5,6], urban planning [7,8], zoning of urban land parcels [9,10,11], water coverage surveys [12,13], and so on. Different from the traditional methods that apply the hand-crafted points to distill information, FCN-based semantic segmentation algorithms, which can recognize each pixel in an image end-to-end and can efficiently acquire the feature information, have made significant breakthroughs over the years and are well implemented in the field of autonomous driving and virtual simulation.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call