Abstract

Currently, an increasing number of convolutional neural networks (CNNs) focus specifically on capturing contextual features (con. feat) to improve performance in semantic segmentation tasks. However, high-level con. feat are biased towards encoding features of large objects, disregard spatial details, and have a limited capacity to discriminate between easily confused classes (e.g., trees and grasses). As a result, we incorporate low-level features (low. feat) and class-specific discriminative features (dis. feat) to boost model performance further, with low. feat helping the model in recovering spatial information and dis. feat effectively reducing class confusion during segmentation. To this end, we propose a novel deep multi-feature learning framework for the semantic segmentation of VHR RSIs, dubbed MFNet. The proposed MFNet adopts a multi-feature learning mechanism to learn more complete features, including con. feat, low. feat, and dis. feat. More specifically, aside from a widely used context aggregation module for capturing con. feat, we additionally append two branches for learning low. feat and dis. feat. One focuses on learning low. feat at a shallow layer in the backbone network through local contrast processing, while the other groups con. feat and then optimizes each class individually to generate dis. feat with better inter-class discriminative capability. Extensive quantitative and qualitative evaluations demonstrate that the proposed MFNet outperforms most state-of-the-art models on the ISPRS Vaihingen and Potsdam datasets. In particular, thanks to the mechanism of multi-feature learning, our model achieves an overall accuracy score of 91.91% on the Potsdam test set with VGG16 as a backbone, performing favorably against advanced models with ResNet101.

Highlights

  • High-resolution remote sensing image analysis plays an important role in geosciences, including disaster control, environmental monitoring, utilization and protection of stateowned land and resources, etc

  • We propose a novel deep multi-feature learning network based on fully convolutional networks (FCNs) [15], dubbed MFNet, as demonstrated in Figure 1b, for the semantic segmentation of VHR remote sensing images (RSIs)

  • We propose a novel multi-feature learning framework for the semantic segmentation of VHR RSIs, which consists of a backbone network and three parts for learning three kinds of features, including contextual features, class-specific discriminative features, and low-level features

Read more

Summary

Introduction

High-resolution remote sensing image analysis plays an important role in geosciences, including disaster control, environmental monitoring, utilization and protection of stateowned land and resources, etc. With the advancement of photography and sensor technologies, the accessibility of very-high-resolution (VHR) remote sensing images (RSIs) has opened new horizons for the computer vision community and increased demands for effective analyses [1]. Convolutional neural networks (CNNs) have been shown to be effective and useful for automatically learning visual representations in an end-to-end manner and readily extending to downstream tasks such as image recognition [2,3], semantic segmentation [4,5,6], etc. CNNs have made remarkable progress in the semantic segmentation of VHR images [1,7,8,9,10,11]. An increasing number of models (schematic diagram demonstrated in Figure 1a) focus on capturing contextual information, or long-range dependencies, which is capable

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call