Abstract
There is growing interest in the field of digital soil mapping (DSM) to quantify the underlying uncertainty of point predictions through different probabilistic prediction models. Uncertainty in DSM is often described in the format of a prediction interval (PI). Yet, PIs or uncertainty estimates in general are only of value for end users if they have good quality, i.e. when they are reliable. Reliability refers to the consistency between predicted conditional probabilities and observed frequencies of independent validation data. Ideally, PIs are also sharp, which refers to the concentration of probabilistic information, i.e. the narrowness of conditional probability distributions. The prediction interval coverage probability (PICP) is currently used in DSM to assess the reliability of PIs but it is ignorant to a potential one-sided bias of its bounding quantiles. Therefore, we propose to complement the current validation procedure with new metrics suggested in the broader probabilistic literature. This includes metrics that do not only evaluate uncertainty estimates in PI format but also quantiles or full conditional probability distributions. The newly proposed metrics are the quantile coverage probability (QCP), the probability integral transform (PIT) and so-called proper scoring rules for relative comparisons. Examples of scoring rules are the continuous ranked probability score (CRPS), which can be decomposed into a reliability part (RELI) and the interval score (IS). Sharpness can be evaluated through the prediction interval width (PIW). We illustrated the use of the various metrics in a case-study using soil pH data from The Land Use and Coverage Area Frame Survey (LUCAS). Thereby, uncertainty estimates of five different models were compared: Kriging with external drift (KED), quantile regression forest (QRF), quantile regression post-processing of a random forest (QRPP RF), quantile regression neural network (QRNN) and a reference null-model (NM).  KED, NM and QRPP RF showed very good reliability according to QCP, PICP, PIT and RELI. QRF was slightly pessimistic in the centre and QRNN very overoptimistic at the edges of the conditional probability distributions. Despite this, QRF performed best according to mean CRPS and mean IS because it produced fewer outliers at the edges. As expected, NM had the lowest sharpness, i.e. the largest PIW values. Sharpness of the other models was overall similar but QRNN had sharper predictions at the edges and QRF was less sharp in the centre of the conditional probability distributions. Lastly, we also generated PIW maps to indicate the spatial uncertainty of the five prediction models. The spatial variability of PIW was larger for QRF, QRNN and QRPP RF in comparison to KED. Whereas with NM, PIW was completely uniform. 
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have