Translating polygenic risk scores for clinical use by estimating the confidence bounds of risk prediction

Jiangming Sun,Alfonso Buil,Gunnar Engström,David M Hougaard,Marju Orho-Melander,Kasper Lage,Olle Melander,Yunpeng Wang,Yan Borné,Inge Amlien,Anders D Børglum,Marcus Jones,Thomas Werge,Lasse Folkersen,Aris Baras,Luca Andrea Lotta

doi:10.1038/s41467-021-25014-7

Abstract

A promise of genomics in precision medicine is to provide individualized genetic risk predictions. Polygenic risk scores (PRS), computed by aggregating effects from many genomic variants, have been developed as a useful tool in complex disease research. However, the application of PRS as a tool for predicting an individual’s disease susceptibility in a clinical setting is challenging because PRS typically provide a relative measure of risk evaluated at the level of a group of people but not at individual level. Here, we introduce a machine-learning technique, Mondrian Cross-Conformal Prediction (MCCP), to estimate the confidence bounds of PRS-to-disease-risk prediction. MCCP can report disease status conditional probability value for each individual and give a prediction at a desired error level. Moreover, with a user-defined prediction error rate, MCCP can estimate the proportion of sample (coverage) with a correct prediction.

Highlights

A promise of genomics in precision medicine is to provide individualized genetic risk predictions
Mondrian Cross-Conformal Prediction (MCCP) is a special implementation of conformal prediction (CP) in classification that can guarantee the validity of the conformal predictor for each class[27,28]
The training set was further randomly partitioned into nequal-sized subsets, one of which was retained as the calibration subset for calculating the MCCP probability value described by Eq (1), and the remaining n−1 subset was used as the proper training set for model building

Summary

Introduction

A promise of genomics in precision medicine is to provide individualized genetic risk predictions. In contrast to rare disease-causing mutations, which have large penetrance, PRS are continuous measures of the liability to disease as well as probabilistic measures of the risk of developing a condition[21,22] It is unclear what thresholds of PRS should be used by clinicians to assess an individual’s risk to develop a disease. In contrast to arbitrary PRS thresholds used in the literature, MCCP, functioning as a calibrator (Fig. 1) for PRS prediction in a test sample, is able to compute the proportion of the sample (termed coverage hereafter) for which the prediction of case-control status is reliable, i.e., below a pre-specified prediction error rate. We show that at the individual level, MCCP reports well-calibrated prediction probabilities, systematically estimates confidence bounds of PRS-to-risk prediction of human complex diseases. MCCP outperforms standard methods in accurately stratifying individuals into risk groups

Methods

Results

Conclusion