Accurate prediction of the spatial distribution of soil sand content is a pre-requisite for land use management, soil quality evaluation and erosion control, as it determines the transport and movement of soil water, fertilizer, air and heat. Digital soil mapping (DSM) is extensively employed for predicting soil properties. However, practical research is required to address the challenge of selecting an optimal prediction model that is both cost-effective and accurate at a specific sampling density. In this study, topsoil samples were collected from 2,848 sampling points in the eastern plains of China (107,200 km2). The performance of different prediction models for mapping soil sand content was compared at 12 levels of sampling density. Moreover, the geographical detector, a statistical method used to assess the spatial stratified heterogeneity of variables, was adopted to determine the major drivers of spatial variation in soil sand content. The results indicated that climate factors are the major drivers of the spatial variability in soil sand content. For the 100% sample size (26.57 samples/103 km2), the geostatistical models that did not depend on environmental variables (ordinary kriging, sequential Gaussian simulation) performed best, followed by the machine learning models (random forest, cubist and support vector machine) and the geostatistical model with environmental variables (co-kriging). Sampling density had a considerable impact on model accuracy, and the advantages of machine learning models became apparent when sampling densities were below 20% (5.31 samples/103 km2). Therefore, the best combination of prediction model and sampling density should be selected to obtain maps of soil sand content economically and accurately. This study provides a valuable reference for the selection of prediction methods in the practical application of DSM.
Read full abstract