Correcting Imprecise Object Locations for Training Object Detectors in Remote Sensing Applications

Maximilian Bernhard,Matthias Schubert

doi:10.3390/rs13244962

Abstract

Object detection on aerial and satellite imagery is an important tool for image analysis in remote sensing and has many areas of application. As modern object detectors require accurate annotations for training, manual and labor-intensive labeling is necessary. In situations where GPS coordinates for the objects of interest are already available, there is potential to avoid the cumbersome annotation process. Unfortunately, GPS coordinates are often not well-aligned with georectified imagery. These spatial errors can be seen as noise regarding the object locations, which may critically harm the training of object detectors and, ultimately, limit their practical applicability. To overcome this issue, we propose a co-correction technique that allows us to robustly train a neural network with noisy object locations and to transform them toward the true locations. When applied as a preprocessing step on noisy annotations, our method greatly improves the performance of existing object detectors. Our method is applicable in scenarios where the images are only annotated with points roughly indicating object locations, instead of entire bounding boxes providing precise information on the object locations and extents. We test our method on three datasets and achieve a substantial improvement (e.g., 29.6% mAP on the COWC dataset) over existing methods for noise-robust object detection.

Highlights

Applications of machine learning and artificial intelligence have gained much attention in the remote sensing community over the last years
We propose a training framework that builds upon a novel label correction scheme and allows for the learning of accurate class activation maps from noisy point supervision; We propose a label correction scheme that takes noisy object locations, as well as a learned class activation map, as an input and corrects them toward their true location; We demonstrate the high quality of our learned class activation maps by successfully mining bounding box sizes from them in a simplistic manner
The predictions on the left were obtained after training on the original noisy annotations and the predictions on the right were obtained after training with our corrected object locations

Summary

Introduction

Applications of machine learning and artificial intelligence have gained much attention in the remote sensing community over the last years. Object detection methods are often employed to recognize and localize objects of interest in aerial and satellite imagery, e.g., [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] This kind of image analysis and interpretation is key for several different areas of application, such as urban planning, precision agriculture, geological hazard detection, or geographic information system (GIS) updating [16,17]. In the case of deep-learning-based object detectors, annotations consist of bounding boxes in pixel coordinates within the training images and class labels describing the object classes. These bounding box annotations have to be available in high quality and large numbers. As annotation is usually carried out manually and potentially requires expert knowledge [2,3,4,6,8,9], the availability of annotations poses an obstacle in many scenarios

Objectives

Methods

Results

Conclusion