Digital Soil Mapping (DSM) presents a highly scalable and efficient alternative to traditional soil analysis, which is typically limited by its labor-intensive processes, time constraints and low spatial resolution. By utilizing advanced computational techniques such as machine learning and remote sensing, DSM overcomes these limitations and improves the accuracy, efficiency and scalability of soil property assessments. This study, conducted across Tamil Nadu, India, applied DSM and Random Forest (RF) models to predict 2 key soil properties: pH and Soil Organic Matter (SOM). We employed Conditioned Latin Hypercube Sampling (cLHS) for optimized sampling point selection and utilized the Boruta algorithm to identify the most relevant covariates for accurate modeling. The RF models were fine-tuned using a comprehensive grid search, with the optimal configuration spanning from 500 to 2000 trees (ntree) and mtry from 1 to 11. The best-performing model was found with 2000 trees and mtry set to 1 yielding superior prediction for SOM and pH with Root Mean Square Error (RMSE) values of 0.71 and 0.60 respectively, showcasing a high level of predictive accuracy. Our findings emphasize the critical role that remote sensing indices play in predicting SOM, while pH was influenced by both terrain features and remote sensing data. In comparison to previous studies, this research offers novel improvements in both sampling optimization and model configuration, leading to enhanced predictive performance. These results hold significant potential for sustainable land-use planning, agricultural productivity and environmental management.
Read full abstract