Abstract

AbstractBackgroundAlzheimer’s Disease (AD) is a neurodegenerative disorder of aging that is difficult to diagnose. Electronic health records (EHR) are a valuable source of longitudinal data for disease phenotyping and prediction. Knowledge graphs, such as SPOKE (Scalable Precision Medicine Open Knowledge Engine), help derive biological meaning from phenotypic data. Here we utilize machine learning on EHR to predict and better understand AD.MethodAD patients and controls were identified from the UCSF EHR. The index time was identified as the first dementia diagnosis or drug record among the AD patients, and 1 year before the last visit among controls. Controls matched on demographics and visit‐related factors were identified at five time points relative to the index time (‐7years, ‐5years, ‐3years, ‐1year, and ‐1day) at a 1:8 ratio. Random forest (RF) classification models for AD were trained utilizing clinical data (conditions, drugs, abnormal measurements) and evaluated with bootstrapped AUROCs on a 30% held‐out set. Top features mapped to phecodes were ranked based on average impurity decrease. Networks were created from SPOKE knowledge graph by identifying shortest paths between top 25 model features and AD, and associations were validated with genetic colocalization analysis on shared variants in the UK Biobank.ResultFrom the UCSF EHR, 749 AD patients and 250,545 Controls were identified with clinical concepts 7 years prior to each individual’s index time. RF trained on AD and matched control individuals’ data performed with AUROC of 0.58 (‐7years) to 0.77 (‐1day). Top phecode features from models include those that are important earlier (osteoarthritis, allergic rhinitis), progress in importance closer to index time (dizziness, vitamin D deficiency, memory loss), and persist in importance throughout time (hyperlipidemia) (Figure 1). Knowledge networks highlight shared biological relationships between top features (e.g., hyperlipidemia, osteoporosis), including identification of relevant genes (e.g., APOE, PSEN 1/2, INS, AKT1, HFE) (Figure 2). Colocalization analysis of LDL cholesterol and AD confirms a commonly associated SNP in APOE.ConclusionEHR together with knowledge networks provide an opportunity to identify early clinical predictors of AD diagnosis, with utility for early intervention, hypothesis generation and biological insight.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call