Abstract

Timely and accurate Land Cover (LC) information is required for various applications, such as climate change analysis and sustainable development. Although machine learning algorithms are most likely successful in LC mapping tasks, the class imbalance problem is known as a common challenge in this regard. This problem occurs during the training phase and reduces classification accuracy for infrequent and rare LC classes. To address this issue, this study proposes a new method by integrating random under-sampling of majority classes and an ensemble of Support Vector Machines, namely Random Under-sampling Ensemble of Support Vector Machines (RUESVMs). The performance of RUESVMs for LC classification was evaluated in Google Earth Engine (GEE) over two different case studies using Sentinel-2 time-series data and five well-known spectral indices, including the Normalized Difference Vegetation Index (NDVI), Green Normalized Difference Vegetation Index (GNDVI), Soil-Adjusted Vegetation Index (SAVI), Normalized Difference Built-up Index (NDBI), and Normalized Difference Water Index (NDWI). The performance of RUESVMs was also compared with the traditional SVM and combination of SVM with three benchmark data balancing techniques namely the Random Over-Sampling (ROS), Random Under-Sampling (RUS), and Synthetic Minority Over-sampling Technique (SMOTE). It was observed that the proposed method considerably improved the accuracy of LC classification, especially for the minority classes. After adopting RUESVMs, the overall accuracy of the generated LC map increased by approximately 4.95 percentage points, and this amount for the geometric mean of producer’s accuracies was almost 3.75 percentage points, in comparison to the most accurate data balancing method (i.e., SVM-SMOTE). Regarding the geometric mean of users’ accuracies, RUESVMs also outperformed the SVM-SMOTE method with an average increase of 6.45 percentage points.

Highlights

  • Land Cover (LC) data are important for various studies, such as climate change, agricultural monitoring, water resource management, natural hazards, and land change assessment [1,2,3,4]

  • The Random Under-sampling Ensemble of Support Vector Machines (RUESVMs) method was applied to Site-1 with 100 different fractions, and the corresponding accuracies were assessed, where the complete results are demonstrated in Supplementary Materials S4

  • The three Overall Accuracy (OA), GM-Producer Accuracy (PA), and GM-User Accuracy (UA) values obtained from the RUESVMs-47 and RUESVMs-27 were above 89%, indicating high potential of the proposed algorithm for delineating both the minority and majority classes

Read more

Summary

Introduction

Land Cover (LC) data are important for various studies, such as climate change, agricultural monitoring, water resource management, natural hazards, and land change assessment [1,2,3,4]. Improving LC classification accuracy with the help of Machine Learning (ML) algorithms to meet users’ needs has drawn considerable attention from the RS community [13,14,15,16]; ML methods provide inferior performance for the infrequent LC classes [17,18] This is related to the fact that most of the ML classifiers try to decrease the overall error rate during the training phase, which leads to a higher level of accuracy for the main classes and lower level of accuracy for the infrequent classes [19,20,21]. This issue leads to an imbalanced distribution among the acquired samples of different LC classes that can potentially influence the accuracy of LC classifications using ML algorithms [19,27,28]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.