Abstract

We use UK Biobank data to train predictors for 65 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, etc. from SNP genotype. For example, our Polygenic Score (PGS) predictor correlates ∼0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information); we call these predictors biomarker risk scores, BMRS. Individuals who are at high risk (e.g., odds ratio of >5× population average) can be identified for conditions such as coronary artery disease (AUC∼0.75), diabetes (AUC∼0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ∼10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: PRS) for common diseases to the risk predictors which result from the concatenation of learned functions BMRS and PGS, i.e., applying the BMRS predictors to the PGS output.

Highlights

  • Modern machine learning (ML) methods have opened the door to using high dimensional inputs to predict health outcomes and risk

  • We present in separate subsections the detailed methods used for Polygenic Score (PGS), Biomarker Risk Scores (BMRS) and genetic biomarker risk scores (gBMRS), respectively, and we end with the methods for comparing our atherosclerotic cardiovascular disease (ASCVD) predictor with the clinically employed ASCVD Risk Estimator

  • As with the section of Materials and Methods, we present the results for PGS, BMRS, gBMRS, and the ASCVD comparison in separate subsections

Read more

Summary

Introduction

Modern machine learning (ML) methods have opened the door to using high dimensional inputs to predict health outcomes and risk. Intercollegiate Guidelines Network [25]) that physicians should use risk scores based on statistical summaries of biomarkers. Examples of such scores include Framingham [26,27], SCORE [28], ASSIGN–SCORE [29], QRISK1 [30], QRISK2 [31], QRISK3 [32], PROCAM [33], Pooled Cohort Studies Equations [34,35,36], CUORE [37], Globorisk [38], Reynolds risk score [39,40], World Health Organization (WHO) risk chart [41,42], MyRisk_stroke calculator [43], NIPPON [44], and UKPDS risk engine [45,46]. Early attempts to identify key genetic risk markers had some missteps, Reference [47] argues that—thanks to new methods and larger datasets—genetic risk scores have developed enough to begin being employed in clinical practice (e.g., [22])

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.