Abstract

The aim of this study is to comparatively analyze the performance of machine learning (ML) algorithms for modeling soil salinity using field-based electrical conductivity (EC) data and Landsat-8 OLI satellite images with derived environmental covariates. We also aim to interpret and explain the ML models with and without over-sampling methods using Shapley (SHAP) values, an explainable ML approach that has not yet been utilized for soil salinity estimation tasks as an ML problem. We investigate two case study areas from western and southeastern Lake Urmia Playas (LUP) in the Northwest of Iran. Our study uses 26 environmental covariates, two ML models, namely extreme gradient boosting (XGBoost) and random forest (RF), and two over-sampling methods: synthetic minority over-sampling technique (SMOTE) and random over-sampling (ROS). Results indicate that XGBoost performs better compared to RF in terms of both R2 and RMSE. Additionally, the visual interpretation of soil salinity maps demonstrated the superiority of XGBoost. SMOTE produced superior results than ROS and no over-sampling test cases. Finally, SHAP analysis illustrated that vegetation indices made a greater contribution to the soil salinity prediction in the West LUP, while visible bands contributed more in the Southeast LUP Region.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call