Abstract

Familial hypercholesterolemia (FH) is an underdiagnosed dominant genetic condition affecting approximately 0.4% of the population and has up to a 20-fold increased risk of coronary artery disease if untreated. Simple screening strategies have false positive rates greater than 95%. As part of the FH Foundation′s FIND FH initiative, we developed a classifier to identify potential FH patients using electronic health record (EHR) data at Stanford Health Care. We trained a random forest classifier using data from known patients (n = 197) and matched non-cases (n = 6590). Our classifier obtained a positive predictive value (PPV) of 0.88 and sensitivity of 0.75 on a held-out test-set. We evaluated the accuracy of the classifier′s predictions by chart review of 100 patients at risk of FH not included in the original dataset. The classifier correctly flagged 84% of patients at the highest probability threshold, with decreasing performance as the threshold lowers. In external validation on 466 FH patients (236 with genetically proven FH) and 5000 matched non-cases from the Geisinger Healthcare System our FH classifier achieved a PPV of 0.85. Our EHR-derived FH classifier is effective in finding candidate patients for further FH screening. Such machine learning guided strategies can lead to effective identification of the highest risk patients for enhanced management strategies.

Highlights

  • Familial hypercholesterolemia (FH) is an autosomal dominant condition with an estimated prevalence of approximately 1 in 250,1 making it the among the most common morbid monogenic disorders

  • As part of the FH Foundation′s FIND (Flag, Identify, Network, Deliver) FH initiative, here we report the development and internal validation of a supervised machine-learning algorithm to identify probable FH cases based on electronic health record (EHR) data from Stanford Health Care as well as the external validation on this classifier using EHR data from the Geisinger Healthcare System

  • We report the area under the receiver operator curve (AUROC) and the area under the precisionrecall curve (AUPRC), which is more informative for low prevalence outcomes.[15]

Read more

Summary

Introduction

Familial hypercholesterolemia (FH) is an autosomal dominant condition with an estimated prevalence of approximately 1 in 250,1 making it the among the most common morbid monogenic disorders. Guidelines recommend the application of diagnostic criteria (e.g., Dutch Lipid Clinic Network (DLCN) or Simon-Broome) in adults for which there is high clinical suspicion, which is usually based on untreated LDL-C values >190 mg/dl plus a positive family history of early onset ASCVD.[1,5,6] there are significant limitations to this approach This strategy is non-specific: While high LDL-C is a cardinal feature of FH, less than 5% of adults with an LDL-C > 190 mg/dl will be found to harbor a causal FH gene mutation.[3] In addition, this strategy largely relies on the availability of untreated LDL-C values and adequate family history information, either/both of which are often unavailable to the healthcare provider. The performance of the classifier, which achieves a PPV of >0.8 across two independent datasets, and the resulting reduction in testing cost as well as case-finding burden, suggests that application of this classifier could lead to increased efficacy of targeting these high-risk patients for enhanced evaluation and intervention

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.