Abstract

Semantic segmentation in aerial images has become an indispensable part in remote sensing image understanding for its extensive application prospects. It is crucial to jointly reason the 2-D appearance along with 3-D information and acquire discriminative global context to achieve better segmentation. However, previous approaches require accurate elevation data (e.g., nDSM and Digital Surface Model (DSM)) as additional inputs to segment semantics, which sorely limits their applications. On the other hand, due to the various forms of objects in complex scenes, the global context is generally dominated by features of salient patterns (e.g., large objects) and tends to smooth inconspicuous patterns (e.g., small stuff and boundaries). In this article, a novel joint framework named height-embedding context reassembly network (HECR-Net) is proposed. First, considering the fact that the corresponding elevation data is insufficient while we still want to exploit the serviceable height information, to alleviate the above data constraint, our method simultaneously predicts semantic labels and height maps from single aerial images by distilling height-aware embeddings implicitly. Second, we introduce a novel context-aware reorganization module to generate a discriminative feature with global context appropriately assigned to each local position. It benefits from both the global context aggregation module for ambiguity eliminating and local feature redistribution module for detailed refinement. Third, we make full use of the learning height-aware embeddings to promote the performance of semantic segmentation via introducing a modality-affinitive propagation block. Finally, without bells and whistles, the segmentation results on ISPRS Vaihingen and Potsdam data set illustrate that the proposed HECR-Net achieves state-of-the-art performance.

Highlights

  • IntroductionSemantic labeling in aerial images has been introduced widespreadly in many fields, such as disaster prediction, building extraction and resource exploration

  • We present a joint reasoning network for dense prediction tasks in the complex scenes, namely HeightEmbedding Context Reassembly Network (HECR-Net)

  • We introduce a context-aware reorganization (CAR) module embedded with a global context aggregation module and a local feature redistribution module

Read more

Summary

Introduction

Semantic labeling in aerial images has been introduced widespreadly in many fields, such as disaster prediction, building extraction and resource exploration. The tremendous success of Convolutional Neural Networks have born out the formidable abilities of feature extraction in computer vision tasks [1], such as image classification [2, 3], object recognition and detection [4, 5] and scene segmentation [6, 7, 8]. Convolutional Networks (FCNs) [6] have shown prominent improvements when applying them to dense prediction tasks like semantic segmentation and height estimation.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call