A Tutorial on Distribution-Free Uncertainty Quantification Using Conformal Prediction

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Statistical prediction models are ubiquitous in psychological research and practice. Increasingly, machine-learning models are used. Quantifying the uncertainty of such predictions is rarely considered, partly because prediction intervals are not defined for many of the algorithms used. However, generating and reporting prediction models without information on the uncertainty of the predictions carries the risk of overinterpreting their accuracy. Conventional methods for prediction intervals (e.g., those defined for ordinary least squares regression) are sensitive to violations of several distributional assumptions. In this tutorial, we introduce conformal prediction, a model-agnostic, distribution-free method for generating prediction intervals with guaranteed marginal coverage, to psychological research. We start by introducing the basic rationale of prediction intervals using a motivating example. Then, we proceed to conformal prediction, which is illustrated in three increasingly complex examples using publicly available data and R code.

Similar Papers
  • Research Article
  • Cite Count Icon 32
  • 10.1016/j.jss.2006.12.548
Statistical models vs. expert estimation for fault prediction in modified code – an industrial case study
  • Dec 22, 2006
  • Journal of Systems and Software
  • Piotr Tomaszewski + 3 more

Statistical models vs. expert estimation for fault prediction in modified code – an industrial case study

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/ssci50451.2021.9659853
Investigating Normalized Conformal Regressors
  • Dec 5, 2021
  • Ulf Johansson + 2 more

Conformal prediction can be applied on top of any machine learning predictive regression model, thus turning it into a conformal regressor. Given a significance level <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\epsilon$</tex> , conformal regressors output valid prediction intervals, i.e., the probability that the interval covers the true value is exactly <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$1-\epsilon$</tex> . To obtain validity, a calibration set that is not used for training the model must be set aside. In standard inductive conformal regression, the size of the prediction intervals is then determined by the absolute error made by the predictive model on a specific instance in the calibration set, where different significance levels correspond to different instances. In this setting, all prediction intervals will have the same size, making the resulting models very unspecific. When adding a technique called normalization, however, the difficulty of each instance is estimated, and the interval sizes are adjusted accordingly. An integral part of normalized conformal regressors is a parameter called <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\beta$</tex> , which determines the relative importance of the difficulty estimation and the error of the model. In this study, the effects of different underlying models, difficulty estimation functions and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\beta$</tex> -values are investigated. The results from a large empirical study, using twenty publicly available data sets, show that better difficulty estimation functions will lead to both tighter and more specific prediction intervals. Furthermore, it is found that the <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\beta$</tex> -values used strongly affect the conformal regressor. While there is no specific <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\beta$</tex> -value that will always minimize the interval sizes, lower <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\beta$</tex> -values lead to more variation in the interval sizes, i.e., more specific models. In addition, the analysis also identifies that the normalization procedure introduces a small but unfortunate bias in the models. More specifically, normalization using low <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\beta$</tex> -values means that smaller intervals are more likely to be erroneous, while the opposite is true for higher <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\beta$</tex> -values.

  • Research Article
  • Cite Count Icon 1
  • 10.1117/1.jmm.20.4.041206
Uncertainty quantification of machine learning models: on conformal prediction
  • Oct 15, 2021
  • Journal of Micro/Nanopatterning, Materials, and Metrology
  • Inimfon I Akpabio + 1 more

Background: Machine learning is predicted to have an increasingly important role in semiconductor metrology. Prediction intervals that describe the reliability of the predictive performance of machine learning models are important to guide decision making and to improve trust in deep learning and other forms of machine learning and artificial intelligence. Image processing is an important application of artificial intelligence. Low-dose images from the scanning electron microscope (SEM) are often used for roughness measurements such as line edge roughness (LER) because of relatively small acquisition times and resist shrinkage, but such images are corrupted by noise, blur, edge effects, and other instrument errors. LER affects semiconductor device performance and the yield of the manufacturing process. Aim: We consider prediction intervals for the deep convolutional neural network EDGENet, which was trained on a large dataset of simulated SEM images and directly estimates the edge positions from a SEM rough line image containing an unknown level of Poisson noise. Approach: Conformal prediction is a relatively recent, increasingly popular, rigorously proven, and simple methodology to address this need for both classification and regression problems, and it does not use distributional assumptions such as Gaussianity or the Bayesian framework; one new variant combines it with another technique to generate prediction intervals known as quantile regression. Results: We illustrate the strengths and limitations of different conformal prediction procedures for the EDGENet approach to LER estimation. Combining these approaches into ensemble schemes and incorporating domain knowledge produces more informative prediction intervals. Conclusions: Deep learning models can help in the estimation of LER, but their acceptance has been hindered by a lack of trust in these techniques. Prediction intervals that provide coverage guarantees are an approach to alleviate this problem and may catalyze the transition within semiconductor manufacturing to a wider acceptance and implementation of machine learning.

  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.neunet.2024.106203
How to evaluate uncertainty estimates in machine learning for regression?
  • Feb 22, 2024
  • Neural Networks
  • Laurens Sluijterman + 2 more

As neural networks become more popular, the need for accompanying uncertainty estimates increases. There are currently two main approaches to test the quality of these estimates. Most methods output a density. They can be compared by evaluating their loglikelihood on a test set. Other methods output a prediction interval directly. These methods are often tested by examining the fraction of test points that fall inside the corresponding prediction intervals. Intuitively, both approaches seem logical. However, we demonstrate through both theoretical arguments and simulations that both ways of evaluating the quality of uncertainty estimates have serious flaws. Firstly, both approaches cannot disentangle the separate components that jointly create the predictive uncertainty, making it difficult to evaluate the quality of the estimates of these components. Specifically, the quality of a confidence interval cannot reliably be tested by estimating the performance of a prediction interval. Secondly, the loglikelihood does not allow a comparison between methods that output a prediction interval directly and methods that output a density. A better loglikelihood also does not necessarily guarantee better prediction intervals, which is what the methods are often used for in practice. Moreover, the current approach to test prediction intervals directly has additional flaws. We show why testing a prediction or confidence interval on a single test set is fundamentally flawed. At best, marginal coverage is measured, implicitly averaging out overconfident and underconfident predictions. A much more desirable property is pointwise coverage, requiring the correct coverage for each prediction. We demonstrate through practical examples that these effects can result in favouring a method, based on the predictive uncertainty, that has undesirable behaviour of the confidence or prediction intervals. Finally, we propose a simulation-based testing approach that addresses these problems while still allowing easy comparison between different methods. This approach can be used for the development of new uncertainty quantification methods.

  • Research Article
  • Cite Count Icon 4
  • 10.1021/acsomega.4c02017
Development and Evaluation of Conformal Prediction Methods for Quantitative Structure-Activity Relationship.
  • Jun 27, 2024
  • ACS omega
  • Yuting Xu + 3 more

The quantitative structure-activity relationship (QSAR) regression model is a commonly used technique for predicting the biological activities of compounds using their molecular descriptors. Besides accurate activity estimation, obtaining a prediction uncertainty metric like a prediction interval is highly desirable. Quantifying prediction uncertainty is an active research area in statistical and machine learning (ML), but the implementation for QSAR remains challenging. However, most ML algorithms with high predictive performance require add-on companions for estimating the uncertainty of their prediction. Conformal prediction (CP) is a promising approach as its main components are agnostic to the prediction modes, and it produces valid prediction intervals under weak assumptions on the data distribution. We proposed computationally efficient CP algorithms tailored to the most widely used ML models, including random forests, deep neural networks, and gradient boosting. The algorithms use a novel approach to the derivation of nonconformity scores from the estimates of prediction uncertainty generated by the ensembles of point predictions. The validity and efficiency of proposed algorithms are demonstrated on a diverse collection of QSAR data sets as well as simulation studies. The provided software implementing our algorithms can be used as stand-alone or easily incorporated into other ML software packages for QSAR modeling.

  • Research Article
  • Cite Count Icon 50
  • 10.1016/j.trc.2018.05.012
Quantifying uncertainty in short-term traffic prediction and its application to optimal staffing plan development
  • May 19, 2018
  • Transportation Research Part C: Emerging Technologies
  • Lei Lin + 5 more

Quantifying uncertainty in short-term traffic prediction and its application to optimal staffing plan development

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.eswa.2023.123087
Conformal prediction of option prices
  • Jan 4, 2024
  • Expert Systems with Applications
  • João A Bastos

Conformal prediction of option prices

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 79
  • 10.1016/j.xphs.2020.09.055
Predicting With Confidence: Using Conformal Prediction in Drug Discovery
  • Oct 17, 2020
  • Journal of Pharmaceutical Sciences
  • Jonathan Alvarsson + 3 more

One of the challenges with predictive modeling is how to quantify the reliability of the models' predictions on new objects. In this work we give an introduction to conformal prediction, a framework that sits on top of traditional machine learning algorithms and which outputs valid confidence estimates to predictions from QSAR models in the form of prediction intervals that are specific to each predicted object. For regression, a prediction interval consists of an upper and a lower bound. For classification, a prediction interval is a set that contains none, one, or many of the potential classes. The size of the prediction interval is affected by a user-specified confidence/significance level, and by the nonconformity of the predicted object; i.e., the strangeness as defined by a nonconformity function. Conformal prediction provides a rigorous and mathematically proven framework for in silico modeling with guarantees on error rates as well as a consistent handling of the models’ applicability domain intrinsically linked to the underlying machine learning model. Apart from introducing the concepts and types of conformal prediction, we also provide an example application for modeling ABC transporters using conformal prediction, as well as a discussion on general implications for drug discovery.

  • Conference Article
  • 10.56952/arma-2025-0602
Modeling with Confidence: Leveraging Conformal Prediction for Calibrated Machine Learning Based Mechanical and Petrophysical Models
  • Jun 8, 2025
  • Bo Zhang + 1 more

ABSTRACT: Machine learning (ML) is revolutionizing reservoir characterization practices by directly using well log and drilling data. However, deterministic predictions of ML models can be misleading and result in expensive mistakes. Hence, uncertainties in ML-predicted mechanical and/or petrophysical properties need to be well quantified. By leveraging a distribution-free and computationally efficient uncertainty quantification method called Conformal Prediction (CP), we can derive calibrated ML models and quantify the uncertainties in their predictions. Several ML models are developed for permeability and elastic modulus prediction in a geothermal site in south Saskatchewan. The Catboost model outperforms other models achieving an R2 of 0.91 and 0.92 for permeability and elastic modulus, respectively. A Conformal Prediction is then built on the selected ML models to complement the predictions with valid measures of prediction intervals with 95% coverage. For a test well, where two different lab-measured permeabilities exist, more than 90% of measured permeabilities fall within the 95% prediction interval. Triaxial geomechanical test results are also comfortably within the bounds of the 95% interval. This suggests that these models provide reliable predictions with limited uncertainties. This paper underscores the crucial role of uncertainty quantification of ML-based prediction models. The study demonstrates how quantifying uncertainty can enhance our confidence in ML-predicted reservoir properties for rigorous subsurface reservoir characterization.

  • Discussion
  • Cite Count Icon 1
  • 10.1161/circheartfailure.121.009278
Unleashing the Power of Machine Learning to Predict Myocardial Recovery After Left Ventricular Assist Device: A Call for the Inclusion of Unstructured Data Sources in Heart Failure Registries.
  • Dec 24, 2021
  • Circulation: Heart Failure
  • Ramsey M Wehbe

Unleashing the Power of Machine Learning to Predict Myocardial Recovery After Left Ventricular Assist Device: A Call for the Inclusion of Unstructured Data Sources in Heart Failure Registries.

  • Conference Article
  • Cite Count Icon 2
  • 10.1117/12.2600838
Uncertainty quantification of machine learning models: on conformal prediction
  • Oct 12, 2021
  • Inimfon I Akpabio + 1 more

Prediction intervals which describe the reliability of the predictive performance of machine learning models are important to guide decision making and to improve trust in deep learning and other forms of machine learning and artificial intelligence. Conformal prediction is a relatively recent, increasingly popular, rigorously proven and simple methodology to address this need for both classification and regression problems, and it does not use distributional assumptions like Gaussianity or the Bayesian framework; one new variant combines it with another technique to generate prediction intervals known as quantile regression. We will illustrate the strengths and limitations of different conformal prediction procedures for a regression problem involving line edge roughness (LER) estimation; LER affects semiconductor device performance and the yield of the manufacturing process. Low-dose images from the scanning electron microscope (SEM) are often used for roughness measurements because of relatively small acquisition times and resist shrinkage, but such images are corrupted by noise, blur, edge effects and other instrument errors. We consider prediction intervals for the deep convolutional neural network EDGENet, which was trained on a large dataset of simulated SEM images and directly estimates the edge positions from a SEM rough line image containing an unknown level of Poisson noise.

  • PDF Download Icon
  • Components
  • 10.3389/fneur.2021.735142.s001
Data_Sheet_1.docx
  • Dec 1, 2021

Background: The prediction of aneurysm treatment outcome can help to optimize treatment strategies. Machine learning has shown positive results in many clinical areas. However, the development of such models requires expertise in machine learning, which is not an easy task for surgeons. Objectives: The recently emerged automated machine learning (AutoML) has shown promise in making machine learning more accessible to non-computer experts. We aimed to evaluate the feasibility of applying AutoML to develop machine learning models for treatment outcome prediction. Methods: Patients with aneurysms treated by endovascular treatment were prospectively recruited from 2016 to 2020. Statistical prediction model was developed using multivariate logistic regression. Two machine learning (ML) models were also developed. One was developed manually and the other was developed by AutoML. Three models were compared based on their area under the precision-recall curve (AUPRC) and area under the receiver operating characteristic curve (AUROC). Results: Aneurysm size, stent-assisted coiling and posterior circulation were the three significant and independent variables associated with treatment outcome. The statistical model showed an AUPRC of 0.432 and AUROC of 0.745. The conventional manually trained ML model showed an improved AUPRC of 0.545 and AUROC of 0.781. The AutoML derived ML model showed the best performance with AUPRC of 0.632 and AUROC of 0.832, significantly better than the other two models. Conclusions This study demonstrated the feasibility of using AutoML to develop high quality ML model, which may outperform statistical model and manually derived ML models. AutoML could be a useful tool that makes machine learning more accessible to clinical researchers.

  • Research Article
  • 10.1016/j.neunet.2025.107809
Conformalized prediction of post-fault voltage trajectories using pre-trained and finetuned attention-driven neural operators.
  • Jul 1, 2025
  • Neural networks : the official journal of the International Neural Network Society
  • Amirhossein Mollaali + 6 more

Conformalized prediction of post-fault voltage trajectories using pre-trained and finetuned attention-driven neural operators.

  • Research Article
  • Cite Count Icon 2
  • 10.1093/mnras/stad2080
Uncertainty quantification of the virial black hole mass with conformal prediction
  • Jul 12, 2023
  • Monthly Notices of the Royal Astronomical Society
  • Suk Yee Yong + 1 more

Precise measurements of the black hole mass are essential to gain insight on the black hole and host galaxy co-evolution. A direct measure of the black hole mass is often restricted to nearest galaxies and instead, an indirect method using the single-epoch virial black hole mass estimation is used for objects at high redshifts. However, this method is subjected to biases and uncertainties as it is reliant on the scaling relation from a small sample of local active galactic nuclei. In this study, we propose the application of conformalized quantile regression (CQR) to quantify the uncertainties of the black hole predictions in a machine learning setting. We compare CQR with various prediction interval techniques and demonstrated that CQR can provide a more useful prediction interval indicator. In contrast to baseline approaches for prediction interval estimation, we show that the CQR method provides prediction intervals that adjust to the black hole mass and its related properties. That is it yields a tighter constraint on the prediction interval (hence more certain) for a larger black hole mass, and accordingly, bright and broad spectral line width source. Using a combination of neural network model and CQR framework, the recovered virial black hole mass predictions and uncertainties are comparable to those measured from the Sloan Digital Sky Survey. The code is publicly available.

  • Research Article
  • 10.5194/soil-11-553-2025
Using Monte Carlo conformal prediction to evaluate the uncertainty of deep-learning soil spectral models
  • Jul 22, 2025
  • SOIL
  • Yin-Chung Huang + 3 more

Abstract. Uncertainty quantification is a crucial step in the practical application of soil spectral models, particularly in supporting real-world decision making and risk assessment. While machine learning has made remarkable strides in predicting various physiochemical properties of soils using spectroscopy, its practical utility in decision making remains limited without quantified uncertainty. Despite its importance, uncertainty quantification is rarely incorporated into soil spectral models, with existing methods facing significant limitations. Existing methods are either computationally demanding, fail to achieve the desired coverage of observed data, or struggle to handle out-of-domain uncertainty. This study introduces an innovative application of Monte Carlo conformal prediction (MC-CP) to quantify uncertainty in deep-learning models for predicting clay content from mid-infrared spectroscopy. We compared MC-CP with two established methods: (1) Monte Carlo dropout and (2) conformal prediction. Monte Carlo dropout generates prediction intervals for each sample and can address larger uncertainties associated with out-of-domain data. Conformal prediction, on the other hand, guarantees ideal coverage of true values but generates unnecessarily wide prediction intervals, making it overly conservative for many practical applications. Using 39 177 samples from the mid-infrared spectral library of the Kellogg Soil Survey Laboratory to build convolutional neural networks, we found that Monte Carlo dropout itself falls short in achieving the desired coverage – its 90 % prediction intervals only covered the observed values in 74 % of the cases, well below the expected 90 % coverage. In contrast, MC-CP successfully combines the strengths of both methods. It achieved a prediction interval coverage probability of 91 %, closely matching the expected 90 % coverage and far surpassing the performance of the Monte Carlo dropout. Additionally, the mean prediction interval width for MC-CP was 9.05 %, narrower than the conformal prediction's 11.11 %. The success of MC-CP enhances the real-world applicability of soil spectral models, paving the way for their integration into large-scale machine learning models, such as soil inference systems, and further transforming decision making and risk assessment in soil science.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon