Abstract

AbstractBackgroundAlzheimer’s disease (AD) is the most common late‐onset neurodegenerative disease. Identifying individuals at increased risk of developing AD is important for early intervention. Risk prediction models are typically based on a limited number of predictors, possibly with sub‐optimal performance. Here, we explore an explainable machine learning (ML) framework, XGBoost and SHapley Additive exPlanations (SHAP) values, for AD risk prediction, which can handle a large number of predictors and output the impact and importance of each predictor.MethodWe developed an XGBoost model that aggregates polygenic risk scores (PRSs), which include both PRS for AD risk and PRS for age at onset of AD, baseline individual characteristics (e.g., non‐genetic factors), and information from electronic health records for predicting incident AD. The PRSs were derived using summary statistics from genome‐wide association studies in the Alzheimer’s Disease Genetics Consortium (ADGC) dataset (n = 19,918). The model was applied to 457,936 white participants in UK Biobank to predict development of AD within 10 years after the baseline visit (n = 2,177 developed AD). We further used SHAP values to explain the relative information in model predictors.ResultFor participants of age 40 and older, the area under the receiver operating characteristic curve (AUC) for AD risk prediction was over 0.880. PRSs ranked second to age (the best predictor) in feature importance. For subjects of age 65 and above, PRSs for AD were the most important features. Our ML model not only identified traditional risk factors for AD, such as age, education, income, body mass index, diabetes, and blood pressure, but also identified predictors from electronic health records that are not typically considered in traditional prediction models, including urinary tract infection, syncope and collapse, chest pain, disorientation and hypercholesterolaemia, for developing AD. Furthermore, SHAP values aided the ranking of feature importance and model explanation.ConclusionOur ML model improves the accuracy of AD risk prediction by efficiently exploring numerous predictors. PRSs play the most important role in developing AD in individuals of age 65 and older. In application, the model also identified novel feature patterns for AD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call