Abstract

Abstract. Image classification is one of the main drivers of the rapid developments in deep learning with convolutional neural networks for computer vision. So is the analogous task of scene classification in remote sensing. However, in contrast to the computer vision community that has long been using well-established, large-scale standard datasets to train and benchmark high-capacity models, the remote sensing community still largely relies on relatively small and often application-dependend datasets, thus lacking comparability. With this paper, we present a classification-oriented conversion of the SEN12MS dataset. Using that, we provide results for several baseline models based on two standard CNN architectures and different input data configurations. Our results support the benchmarking of remote sensing image classification and provide insights to the benefit of multi-spectral data and multi-sensor data fusion over conventional RGB imagery.

Highlights

  • One of the most crucial preconditions for the development of machine learning models for the interpretation of remote sensing data is the availability of annotated datasets

  • We present the conversion of the SEN12MS dataset to the image classification purpose as well as a couple of baseline models including their evaluation

  • It is still interesting to note that for both convolutional neural network (CNN) architectures, data fusion provides the best result for this class, while multi-spectral imagery yields the worst result – even worse than RGB only

Read more

Summary

INTRODUCTION

One of the most crucial preconditions for the development of machine learning models for the interpretation of remote sensing data is the availability of annotated datasets. The great success of deep learning was largely driven by the desire to solve the image classification problem, i.e. assigning one or more labels to a given photograph. For this purpose, many researchers have relied on the ImageNet database (Deng et al, 2009), which contains millions of annotated images. As can be seen from this non-complete selection, most datasets built for remote sensing image classification deal with high-resolution aerial imagery, usually providing three or four spectral channels (RGB, or RGB plus near-infrared). Sensing-specific models that can later be fine-tuned to individual problems and user needs

SEN12MS FOR IMAGE CLASSIFICATION
The original SEN12MS Dataset
Creation of Scene Labels from Dense Labels
Dataset Statistics
BASELINE MODELS FOR SINGLE-LABEL AND MULTI-LABEL SCENE CLASSIFICATION
ResNet
DenseNet
Training Details
BENCHMARK RESULTS
SUMMARY & CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.