Abstract

PurposeAccurate discrimination of benign and pathogenic rare variation remains a priority for clinical genome interpretation. State-of-the-art machine learning variant prioritization tools are imprecise and ignore important parameters defining gene–disease relationships, e.g., distinct consequences of gain-of-function versus loss-of-function variants. We hypothesized that incorporating disease-specific information would improve tool performance. MethodsWe developed a disease-specific variant classifier, CardioBoost, that estimates the probability of pathogenicity for rare missense variants in inherited cardiomyopathies and arrhythmias. We assessed CardioBoost’s ability to discriminate known pathogenic from benign variants, prioritize disease-associated variants, and stratify patient outcomes. ResultsCardioBoost has high global discrimination accuracy (precision recall area under the curve [AUC] 0.91 for cardiomyopathies; 0.96 for arrhythmias), outperforming existing tools (4–24% improvement). CardioBoost obtains excellent accuracy (cardiomyopathies 90.2%; arrhythmias 91.9%) for variants classified with >90% confidence, and increases the proportion of variants classified with high confidence more than twofold compared with existing tools. Variants classified as disease-causing are associated with both disease status and clinical severity, including a 21% increased risk (95% confidence interval [CI] 11–29%) of severe adverse outcomes by age 60 in patients with hypertrophic cardiomyopathy. ConclusionsA disease-specific variant classifier outperforms state-of-the-art genome-wide tools for rare missense variants in inherited cardiac conditions (https://www.cardiodb.org/cardioboost/), highlighting broad opportunities for improved pathogenicity prediction through disease specificity.

Highlights

  • The accurate prediction of the effect of a previously unseen genetic variant on disease risk is an unmet need in clinical genetics

  • Variants classified as disease-causing are associated with both disease status and clinical severity, including a 21% increased risk (95% confidence interval [CI] 11–29%) of severe adverse outcomes by age 60 in patients with hypertrophic cardiomyopathy

  • CardioBoost was compared against state-of-the-art genomewide variant pathogenicity predictors including M-CAP,14 REVEL,15 CADD,5 Eigen,16 and PrimateAI,17 reported to have leading performance in pathogenicity prediction of rare missense variants

Read more

Summary

Introduction

The accurate prediction of the effect of a previously unseen genetic variant on disease risk is an unmet need in clinical genetics. 1234567890():,; prediction of variant pathogenicity is integrated as one line of supporting evidence to assess the clinical significance of genetic variation. Several tools have been developed to predict the effects of rare variants given multiple functional annotations to derive scores describing the likelihood of pathogenicity.. While existing genome-wide tools learn from large-scale data over the entire genome, they might compromise the prediction accuracy for specific sets of genes and diseases in the following ways. Genome-wide classification tools may not benefit from specific lines of evidence only available for a subset of well-characterized genes or diseases. We have previously shown that the addition of gene- and diseasespecific evidence into a classification model improves variant interpretation in inherited cardiac diseases. Most genome-wide prediction tools are reported to have low specificity.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call