Abstract

Although sampling strategy plays an important role in groundwater potential mapping and significantly influences model accuracy, researchers often apply a simple random sampling method to determine absence (non-occurrence) samples. In this study, an automated, user-friendly geographic information system (GIS)-based tool, selection of absence samples (SAS), was developed using the Python programming language. The SAS tool takes into account different geospatial concepts, including nearest neighbor (NN) and hotspot analyses. In a case study, it was successfully applied to the Bojnourd watershed, Iran, together with two machine learning models (random forest (RF) and multivariate adaptive regression splines (MARS)) with GIS and remotely sensed data, to model groundwater potential. Different evaluation criteria (area under the receiver operating characteristic curve (AUC-ROC), true skill statistic (TSS), efficiency (E), false positive rate (FPR), true positive rate (TPR), true negative rate (TNR), and false negative rate (FNR)) were used to scrutinize model performance. Two absence sample types were produced, based on a simple random method and the SAS tool, and used in the models. The results demonstrated that both RF (AUC-ROC = 0.913, TSS = 0.72, E = 0.926) and MARS (AUC-ROC = 0.889, TSS = 0.705, E = 0.90) performed better when using absence samples generated by the SAS tool, indicating that this tool is capable of producing trustworthy absence samples to improve groundwater potential models.

Highlights

  • Different approaches such as data-driven, statistical, and machine learning models can be used to model groundwater potential

  • Selecting appropriate absence samples is a considerable challenge, and researchers often use a simple random sampling technique to deal with this challenge

  • The main finding of the study is that both the Random Forest (RF) and multivariate adaptive regression splines (MARS) models showed better predictive performance when based on absence samples created by the SAS method rather than the simple random method

Read more

Summary

Introduction

Different approaches such as data-driven, statistical, and machine learning models can be used to model groundwater potential. They are based on a statistical assumption that the past and present situations and state of a phenomenon are key to determining and predicting its future situation and state. Presence samples are usually obtained by conducting field surveys and analyses of high-quality aerial photographs and satellite images. They are usually more reliable than absence samples, because they are based on proof of existence of the given phenomenon. Determining absence samples (i.e., non-spring) is usually a challenging task and can be a key source of model uncertainty, strongly affecting model performance [7,8]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call