Abstract

A promise of genomics in precision medicine is to provide individualized genetic risk predictions. Polygenic risk scores (PRS), computed by aggregating effects from many genomic variants, have been developed as a useful tool in complex disease research. However, the application of PRS as a tool for predicting an individual’s disease susceptibility in a clinical setting is challenging because PRS typically provide a relative measure of risk evaluated at the level of a group of people but not at individual level. Here, we introduce a machine-learning technique, Mondrian Cross-Conformal Prediction (MCCP), to estimate the confidence bounds of PRS-to-disease-risk prediction. MCCP can report disease status conditional probability value for each individual and give a prediction at a desired error level. Moreover, with a user-defined prediction error rate, MCCP can estimate the proportion of sample (coverage) with a correct prediction.

Highlights

  • A promise of genomics in precision medicine is to provide individualized genetic risk predictions

  • Mondrian Cross-Conformal Prediction (MCCP) is a special implementation of conformal prediction (CP) in classification that can guarantee the validity of the conformal predictor for each class[27,28]

  • The training set was further randomly partitioned into nequal-sized subsets, one of which was retained as the calibration subset for calculating the MCCP probability value described by Eq (1), and the remaining n−1 subset was used as the proper training set for model building

Read more

Summary

Introduction

A promise of genomics in precision medicine is to provide individualized genetic risk predictions. In contrast to rare disease-causing mutations, which have large penetrance, PRS are continuous measures of the liability to disease as well as probabilistic measures of the risk of developing a condition[21,22] It is unclear what thresholds of PRS should be used by clinicians to assess an individual’s risk to develop a disease. In contrast to arbitrary PRS thresholds used in the literature, MCCP, functioning as a calibrator (Fig. 1) for PRS prediction in a test sample, is able to compute the proportion of the sample (termed coverage hereafter) for which the prediction of case-control status is reliable, i.e., below a pre-specified prediction error rate. We show that at the individual level, MCCP reports well-calibrated prediction probabilities, systematically estimates confidence bounds of PRS-to-risk prediction of human complex diseases. MCCP outperforms standard methods in accurately stratifying individuals into risk groups

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call