Dense context distillation network for semantic parsing of oblique UAV images

Youli Ding,Shuhan Shen,Xianwei Zheng,Hanjiang Xiong,Yiping Chen

doi:10.1016/j.jag.2022.103062

Abstract

Semantic segmentation of oblique unmanned aerial vehicle (UAV) images serves as a foundation for many modern urban applications, such as road scene monitoring and semantic 3D modeling. However, objects in UAV images can vary intensely in size and undergo severe perspective distortion because of the oblique viewing style. Existing general segmentation models designed for ground and remote sensing images rarely considered these challenges specific to UAV images. Therefore, they have large difficulties in learning discriminative representation for simultaneously reasoning the extremely large and small objects in UAV images. In this paper, we propose a dense context distillation network (DCDNet) to learn distortion-robust feature representation for semantic segmentation of UAV images. The basic DCDNet is deployed as an dual-branch encoder–decoder architecture. To accomplish the goal of dense context distillation, DCDNet is first equipped with several cross-scale context selectors at different encoding stages to densely and selectively gather the useful context from low- to high-level dual-scale feature maps. A joint supervision is then applied to reinforce the learning of shallower features for distilling more low-level contexts that are vital to the reasoning of small or thin structures. A multi-scale feature aggregator is incorporated to adaptively fuse the long-range context during decoding, which absorbs the complementary merits of the dense context captured from feature maps of different levels. With the dense context distillation, DCDNet is more capable of offering the differently scaled objects with the required context for better learning and prediction. Extensive experiments on the challenging UAVid dataset demonstrate that our DCDNet can well adapt to the oblique UAV images, achieving a state-of-the-art segmentation performance with a mIoU score of 72.38%. • A DCDNet is proposed for semantic parsing of UAV images. • DCDNet offers enough context for reasoning the large and small road scene objects. • DCDNet achieves state-of-the-art segmentation performance.

Full Text