Abstract

Remote sensing imagery, such as that provided by the United States Geological Survey (USGS) Landsat satellites, has been widely used to study environmental protection, hazard analysis, and urban planning for decades. Clouds are a constant challenge for such imagery and, if not handled correctly, can cause a variety of issues for a wide range of remote sensing analyses. Typically, cloud mask algorithms use the entire image; in this study we present an ensemble of different pixel-based approaches to cloud pixel modeling. Based on four training subsets with a selection of different input features, 12 machine learning models were created. We evaluated these models using the cropped LC8-Biome cloud validation dataset. As a comparison, Fmask was also applied to the cropped scene Biome dataset. One goal of this research is to explore a machine learning modeling approach that uses as small a training data sample as possible but still provides an accurate model. Overall, the model trained on the sample subset (1.3% of the total training samples) that includes unsupervised Self-Organizing Map classification results as an input feature has the best performance. The approach achieves 98.57% overall accuracy, 1.18% cloud omission error, and 0.93% cloud commission error on the 88 cropped test images. By comparison to Fmask 4.0, this model improves the accuracy by 10.12% and reduces the cloud omission error by 6.39%. Furthermore, using an additional eight independent validation images that were not sampled in model training, the model trained on the second largest subset with an additional five features has the highest overall accuracy at 86.35%, with 12.48% cloud omission error and 7.96% cloud commission error. This model’s overall correctness increased by 3.26%, and the cloud omission error decreased by 1.28% compared to Fmask 4.0. The machine learning cloud classification models discussed in this paper could achieve very good performance utilizing only a small portion of the total training pixels available. We showed that a pixel-based cloud classification model, and that as each scene obviously has unique spectral characteristics, and having a small portion of example pixels from each of the sub-regions in a scene can improve the model accuracy significantly.

Highlights

  • We showed that a pixel-based cloud classification model, and that as each scene obviously has unique spectral characteristics, and having a small portion of example pixels from each of the sub-regions in a scene can improve the model accuracy significantly

  • Total Cloud where correctness measures the overall model correctness for all three classes, cloud commission error measures the portion of pixels that are estimated as cloud but are not, cloud omission error measures the portion of cloud pixels that failed to be detected

  • The empirical pixel-based machine learning models discussed in this paper could achieve very good and consistent performance when using only a tiny portion of the total available training data

Read more

Summary

Introduction

Remote sensing imagery, such as that provided by the United States Geological Survey (USGS) Landsat satellites, has been widely used to study environmental protection, hazard analysis, and urban planning for decades. The usefulness and applications of remote sensing imagery continue to expand as more image-based models and algorithms have emerged so that we can derive more knowledge and information from satellite images. Due to the ubiquity of clouds, cloud pixels are a persistent presence in such imagery, especially in tropical areas. A study estimates that about 67% of the earth’s surface is typically covered by cloud based on satellite data from July 2002 to April 2015 [1]. The presence of cloud has a serious impact on the use of remote sensing images.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call