Abstract

Classification of clouds, cirrus, snow, shadows and clear sky areas is a crucial step in the pre-processing of optical remote sensing images and is a valuable input for their atmospheric correction. The Multi-Spectral Imager on board the Sentinel-2’s of the Copernicus program offers optimized bands for this task and delivers unprecedented amounts of data regarding spatial sampling, global coverage, spectral coverage, and repetition rate. Efficient algorithms are needed to process, or possibly reprocess, those big amounts of data. Techniques based on top-of-atmosphere reflectance spectra for single-pixels without exploitation of external data or spatial context offer the largest potential for parallel data processing and highly optimized processing throughput. Such algorithms can be seen as a baseline for possible trade-offs in processing performance when the application of more sophisticated methods is discussed. We present several ready-to-use classification algorithms which are all based on a publicly available database of manually classified Sentinel-2A images. These algorithms are based on commonly used and newly developed machine learning techniques which drastically reduce the amount of time needed to update the algorithms when new images are added to the database. Several ready-to-use decision trees are presented which allow to correctly label about 91 % of the spectra within a validation dataset. While decision trees are simple to implement and easy to understand, they offer only limited classification skill. It improves to 98 % when the presented algorithm based on the classical Bayesian method is applied. This method has only recently been used for this task and shows excellent performance concerning classification skill and processing performance. A comparison of the presented algorithms with other commonly used techniques such as random forests, stochastic gradient descent, or support vector machines is also given. Especially random forests and support vector machines show similar classification skill as the classical Bayesian method.

Highlights

  • The detection of clouds, cirrus, and shadows is among the first processing steps after processing raw instrument measurements to at-sensor radiance or reflectance values

  • A robust discrimination of cloudy, cirrus-contaminated, and clear sky pixels is crucial for many applications, including the retrieval of surface reflectance within atmospheric correction or the co-registration with other images

  • The classification is performed by computing the occurrence probability for each class and selecting the one with the highest probability. Since this method computes occurrence probabilities for each class, it is straightforward to include a confidence measure for each classification. Such a measure can be of great importance in post processing steps, where one might want to process only clear sky pixels for which the classification algorithm was very certain

Read more

Summary

Introduction

The detection of clouds, cirrus, and shadows is among the first processing steps after processing raw instrument measurements to at-sensor radiance or reflectance values. The detection performance of such algorithms can be used to establish a baseline for competing and potentially more sophisticated algorithms to quantitatively assess their potential additional computational costs To establish such as baseline, we decided to build up a database of labeled MSI spectra and to apply machine learning techniques to derive ready-to-use classification algorithms. We discuss decision trees with features computed from simple band math formulas (see Section 3.2) which are one of the most simple and straightforward to understand techniques These algorithms are simple to implement for any processing chain, can be represented by simple charts, but offer only limited classification skill.

Database of Manually Classified Sentinel-2 MSI Data
Classification Based on Machine Learning
Optimization and Validation Strategy
Ready-to-Use Decision Trees
Ready-to-Use Classical Bayesian
Comparison with Commonly Used Techniques
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call