Abstract

Legacy soil datasets are a valuable resource and should be used to the greatest extent possible. However, such datasets may be incomplete, and lack observations for every attribute, as the dataset may be compiled from multiple studies. To use these datasets in soil mapping and modeling work, it is useful to fill the gaps in the dataset with estimated values. Machine learning is an approach that can provide estimates with high accuracy. In this study, the machine learner Random Forest (RF) was used to estimate bulk density values in an existing dataset from the province of British Columbia (BC), Canada, which was used as a case study dataset. As the dataset had missing observations across all attributes, multiple models needed to be generated and tested to determine the accuracy of the estimated values produced. A total of 512 models were tested using RF, which were then ranked based on the concordance correlation coefficient (CCC) of their estimates; the CCC of all models ranged from 0.51 to 0.92. The estimates of 27 of these models were then used to fill the missing observations in the dataset; the accuracy of these models ranged from a CCC of 0.63 to 0.92. Further, uncertainty estimates for the predictions were generated using quantile regression, which was coupled with the RF approach. Each model tested therefore had an accuracy measurement and an uncertainty estimate. This approach, of using multiple models developed in RF, can be applied to other legacy soil datasets with inconsistently missing values to produce estimates which can fill the missing observations, and produce uncertainty estimates for those estimates.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.