Abstract
Digital soil mapping of soil particle-size fractions (PSFs) using log-ratio methods is a widely used technique. As a hybrid interpolator, regression kriging (RK) provides a way to improve prediction accuracy. However, there have been few comparisons with other techniques when RK is applied for compositional data, and it is not known if its performance based on different balances of isometric log-ratio (ILR) transformation is robust. Here, we compared the generalized linear model (GLM), random forest (RF), and their hybrid patterns (RK) using different transformed data based on three ILR balances, with 29 environmental covariables (ECs) for the prediction of soil PSFs in the upper reaches of the Heihe River Basin (HRB), China. The results showed that the RF performed best, with more accurate predictions, but the GLM produced a more unbiased prediction. As a hybrid interpolator, RK was recommended because it widened the data ranges of the prediction values, and modified the bias and accuracy of most models, especially the RF. The prediction maps generated from RK revealed more details of the soil sampling points than the other models. Different data distributions were produced for the three ILR balances. Using the most abundant component of the compositional data as the first component of the permutations was not considered to be the right choice because it produced the worst performance. Based on the relative abundance of the components, we recommend that the focus should be on data distribution. This study provides a reference for the mapping of soil PSFs combined with transformed data at the regional scale.
Highlights
To “spurious correlations” (Pawlowsky-Glahn, 1984), traditional statistical methods based on the Euclidean geometry may generate mistakes when dealing directly with soil particle-size fractions (PSFs) data (Filzmoser et al, 2009)
We compared the generalized linear model (GLM), random forest (RF), and their hybrid patterns (RK) using different transformed data based on three isometric log-ratio (ILR) balances, with 29 environmental covariables (ECs) for the prediction of soil PSFs in the upper reaches of the Heihe River Basin, China
The comparison of means and medians demonstrated that the back217 transformed means of three sets of ILR transformed data were the same, and the mean ILR of sand was closer to the median 218 compared with the original soil PSF data
Summary
To “spurious correlations” (Pawlowsky-Glahn, 1984), traditional statistical methods based on the Euclidean geometry may generate mistakes when dealing directly with soil PSF data (Filzmoser et al, 2009). For local scale study areas, geostatistical models, i.e., ordinary kriging (OK) and compositional kriging, combined with log-ratio transformed data, are sufficient to map spatial patterns, as shown in our previous study (Wang and Shi, 2017) As another perspective, functional compositions combined with the kriging method can be applied to produce soil particle size curves (PSCs) (Menafoglio et al, 2014), providing an abundance of information. An increasing number of studies have concentrated on mapping soil PSFs using different machine-learning models combined with ancillary data (i.e., environmental covariables, ECs) on a broad basin scale (Zhang et al, 2020), national scale (Akpa et al, 2014), and even global scale (Hengl et al, 2017) using log-ratio transformed data. In log-ratio methods, the ILR method performs better than ALR and CLR in both theory and in practice
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.