Abstract
It is quite common in digital soil mapping (DSM) to quantify the uncertainty of issued predictions, that is to make probabilistic predictions. Yet, little attention has been paid to its validation. Probabilistic predictions are only of value for end users if they are reliable and ideally also sharp. Reliability refers to the consistency between predicted conditional probabilities and observed frequencies of independent test data. Sharpness refers to the concentration of a conditional probability distribution function, i.e. its narrowness. The prediction interval coverage probability (PICP) is currently used in DSM to validate the reliability of prediction intervals but it is ignorant of a potential one-sided bias of its boundaries. Therefore, we propose to extend the current validation procedure with metrics used in the broader probabilistic literature. These metrics not only evaluate probabilistic predictions in prediction interval format but also quantiles or full conditional probability distributions. We suggest the quantile coverage probability (QCP) and probability integral transform (PIT) histogram as alternatives to PICP and proper scoring rules for relative comparisons of competing probabilistic models. As scoring rules, we present the interval score (IS) and the continuous ranked probability score (CRPS), which can be decomposed into a reliability part (RELI). We illustrated the use of these metrics in a case study using soil pH and soil organic carbon from the LUCAS-soil database. Thereby, probabilistic predictions of five different models were compared: a reference null model (NM), quantile regression forest (QRF), quantile regression post-processing of a random forest (QRPP RF), kriging with external drift (KED) and quantile regression neural network (QRNN). For KED and QRNN, one-sided bias was found. This was not apparent from PICP but was shown by use of the PIT histogram and QCP. RELI summarized the trends found in QCP, PICP and PIT histograms to one numerical value. CRPS and IS were especially harsh to outliers and low sharpness. According to CRPS and IS, the best probabilistic predictions were obtained by QRF and QRPP RF and the worst by NM.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.