Abstract

The tremendous advances in deep neural networks have demonstrated the superiority of deep learning techniques for applications such as object recognition or image classification. Nevertheless, deep learning-based methods usually require a large amount of training data, which mainly comes from manual annotation and is quite labor-intensive. In order to reduce the amount of manual work required for generating enough training data, we hereby propose to leverage existing labeled data to generate image annotations automatically. Specifically, the pixel labels are firstly transferred from one image modality to another image modality via geometric transformation to create initial image annotations, and then additional information (e.g., height measurements) is incorporated for Bayesian inference to update the labeling beliefs. Finally, the updated label assignments are optimized with a fully connected conditional random field (CRF), yielding refined labeling for all pixels in the image. The proposed approach is tested on two different scenarios, i.e., (1) label propagation from annotated aerial imagery to unmanned aerial vehicle (UAV) imagery and (2) label propagation from map database to aerial imagery. In each scenario, the refined image labels are used as pseudo-ground truth data for training a convolutional neural network (CNN). Results demonstrate that our model is able to produce accurate label assignments even around complex object boundaries; besides, the generated image labels can be effectively leveraged for training CNNs and achieve comparable classification accuracy as manual image annotations, more specifically, the per-class classification accuracy of the networks trained by the manual image annotations and the generated image labels have a difference within ± 5 % .

Highlights

  • The last decade has witnessed a revolutionary success of deep neural networks

  • We demonstrate that the automatic annotations generated by our method can be directly used as ground truth data and achieve comparable accuracy in convolutional neural network (CNN) based classification as manual annotations

  • The average Ground Sampling Distance (GSD) of unmanned aerial vehicle (UAV) images is 1.8 cm and the image size is 6000 × 4000 pixels; aerial imagery was acquired by the DLR 4k sensor system [24] at an altitude of 600 m above ground, including two Canon EOS-1DX cameras with 15◦ sidewards looking angle and a FOV of 75◦ across

Read more

Summary

Introduction

The last decade has witnessed a revolutionary success of deep neural networks. With the support of ever-increasing computing power, various deep neural networks have emerged for a wide range of applications and demonstrated significant improvements compared to traditional machine learning methods. Existing label propagation methods are generally applied to data of the same source, e.g., across video frames or from terrestrial point cloud to street-view images, where the source data and target imagery have high similarity in view and appearance. When it comes to multi-view imagery, the labels propagation suffers from view differences between source data and target images, resulting in sparse and erroneous annotations. The effect of large scale PGT on deep learning based classification is investigated in [17] In such cases, the propagated annotations were merely employed as augmented ground truth data and trained together with manually labeled ground truth.

Methodology
Inference
Image Annotation via Label Propagation from Aerial Imagery to UAV Imagery
Data Description
Data Pre-Processing
Label Transfer
Pixel Unary Potentials
Model Parameter Settings
Training a CNN Using Generated Annotations
Generating Image Annotation on a Scale
Analysis of Inferred Annotations
Comparison with Manual Annotations
Comparison with OSM building Footprints
Findings
Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call