Abstract

Instance segmentation in aerial images is of great significance for remote sensing applications, and it is inherently more challenging because of cluttered background, extremely dense and small objects, and objects with arbitrary orientations. Besides, current mainstream CNN-based methods often suffer from the trade-off between labeling cost and performance. To address these problems, we present a pipeline of hybrid supervision. In the pipeline, we design an ancillary segmentation model with the bounding box attention module and bounding box filter module. It is able to generate accurate pseudo pixel-wise labels from real-world aerial images for training any instance segmentation models. Specifically, bounding box attention module can effectively suppress the noise in cluttered background and improve the capability of segmenting small objects. Bounding box filter module works as a filter which removes the false positives caused by cluttered background and densely distributed objects. Our ancillary segmentation model can locate object pixel-wisely instead of relying on horizontal bounding box prediction, which has better adaptability to arbitrary oriented objects. Furthermore, oriented bounding box labels are utilized for handling arbitrary oriented objects. Experiments on iSAID dataset show that the proposed method can achieve comparable performance (32.1 AP) to fully supervised methods (33.9 AP), which is obviously higher than weakly supervised setting (26.5 AP), when using only 10% pixel-wise labels.

Highlights

  • Instance segmentation in aerial images is an important task, which benefits various applications, e.g., monitoring of land changes [1], urban management [2] and traffic monitoring [3]

  • We present a pipeline of hybrid supervision that takes advantage of low labeling cost from bounding box labels and high accuracy from pixel-wise labels

  • We first introduce the idea of hybrid supervision, the design of ancillary segmentation model and how we address the challenges in aerial images are described

Read more

Summary

Introduction

Instance segmentation in aerial images is an important task, which benefits various applications, e.g., monitoring of land changes [1], urban management [2] and traffic monitoring [3]. With the fast development of deep convolutional neural networks (CNN), the CNN-based instance segmentation methods are able to reach higher performance. The prerequisite for their performance is the availability of large-scale image dataset with accurate manually annotated labels. Labeling a bounding box on an object takes 10.2 s on average while labeling a segmentation annotation takes 79 s, which is about 8× slower [10]. Aerial images usually have a wide range of view, which means they contain much more objects of interest than natural images. Taking the iSAID dataset [11] as an example, it is a large-scale aerial image dataset where each image has 233.6 instances on average. There are only 2.8 instances per image in natural image dataset

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call