Performance of linear mixed models and random forests for spatial prediction of soil pH

Mirriam Makungwe,Lydia Mumbi Chabala,Benson H Chishala,R Murray Lark

doi:10.1016/j.geoderma.2021.115079

Mirriam Makungwe, Lydia Mumbi Chabala + Show 2 more

Open Access

PDF Available

https://doi.org/10.1016/j.geoderma.2021.115079

Copy DOI

Export

Save

Cite

Journal: Geoderma	Publication Date: Apr 2, 2021
Citations: 33	License type: cc-by-nc-nd

Affiliation: University of Zambia, University of Nottingham

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Digital soil maps describe the spatial variation of soil and provide important information on spatial variation of soil properties which provides policy makers with a synoptic view of the state of the soil. This paper presents a study to tackle the task of how to map the spatial variation of soil pH across Zambia. This was part of a project to assess suitability for rice production across the country. Legacy data on the target variable were available along with additional exhaustive environmental covariates as potential predictor variables. We had the option of undertaking spatial prediction by geostatistical or machine learning methods. We set out to compare the approaches from the selection of predictor variables through to model validation, and to test the predictors on a set of validation observations. We also addressed the problem of how to robustly validate models from legacy data when these have, as is often the case, a strongly clustered spatial distribution. The validation statistics results showed that the empirical best linear unbiased predictor (EBLUP) with the only fixed effect a constant mean (ordinary kriging) performed better than the other methods. Random forests had the largest model-based estimates of the expected squared errors. We also noticed that the random forest algorithm was prone to select as “important” spatially correlated random variables which we had simulated.

Full Text