MODIS probabilistic cloud masking over the Amazonian evergreen tropical forests: a comparison of machine learning-based methods

José Gomis-Cebolla,Juan Carlos Jimenez,José Antonio Sobrino

doi:10.1080/01431161.2019.1637963

Abstract

ABSTRACT Amazonian tropical forests play a significant role in global water, carbon and energy cycles. Satellite remote sensing is presented as a feasible means in order to monitor these forests. In particular, the Moderate Resolution Imaging Spectroradiometer (MODIS) is amongst major tools for studying this region. Nevertheless, MODIS operative surface variable retrieval was reported to be impacted by cloud contamination effects. A proper cloud masking is a major consideration in order to ensure accuracy when analysing Amazonian tropical forests current and future status. In the present study, the potential of supervised machine learning algorithms in order to overcome this issue is evaluated. In front of global operative MODIS cloud masking algorithms (MYD35 and the Multi-Angle Implementation of Atmospheric Correction Algorithm (MAIAC)) these algorithms benefit from the fact that they can be optimized to properly represent the local cloud conditions of the region. Models considered were: Gaussian Naïve Bayes (GNB), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Random Forests (RF), Support Vector Machine (SVM) and Multilayer Perceptron (MLP). These algorithms are able to provide a continuous measure of cloud masking uncertainty (i.e. a probability estimate of each pixel belonging to clear and cloudy class) and therefore can be used for probabilistic cloud masking. Truth reference dataset (a priori knowledge) requirement was satisfied by considering the collocation of Cloud Profiling Radar (CPR) and Cloud Aerosol Lidar with Orthogonal Polarization (CALIOP) observations with MODIS sensor. Model performance was tested using three independent datasets: 1) collocated CPR/CALIOP and MODIS data, 2) MODIS manually classified images and 3) in-situ ground data. For satellite image and in-situ testing results were additionally compared to current operative MYD35 (version 6.1) and MAIAC cloud masking algorithms. Satellite image and in-situ testing results show that machine learning algorithms are able to improve MODIS operative cloud masking performance over the region. MYD35 and MAIAC tend to underestimate and overestimate the cloud cover over the study region, respectively. Amongst the models considered, probabilistic algorithms (LDA, GNB and in less extent QDA) provided better performance than RF, SVM and MLP machine learning algorithms as they were able to better deal with the viewing conditions limitation that resulted from collocating MODIS and CPR/CALIOP observations. In particular, best performance was obtained for LDA with a difference in Kappa coefficient (model minus MODIS operative algorithm) of 0.293/0.155 (MYD35/MAIAC, respectively) considering satellite image testing validation. Worst performance was obtained for MLP with a difference in Kappa coefficient of 0.175/0.037. For in-situ testing, models overall accuracy (OA) and Kappa coefficient values are higher than MYD35/MAIAC respective values. Models are computationally efficient (swath image calculation time between 0.37 and 9.49 s) and thus being able to be implanted for remote-sensing vegetation retrieval processing chains over the Amazonian tropical forests. LDA stands out as the best candidate because of its maximum accuracy and minimum computational associated.

Full Text