Abstract

BackgroundReal-world data with decades-long medical records are increasingly available alongside the growing adoption of machine learning in healthcare research. We evaluated the performance of machine learning models in predicting the risk of Alzheimer’s disease (AD) using data from the Finnish national registers. MethodsWe conducted a case-control study using data from the Finnish MEDALZ (Medication use and Alzheimer’s disease) study. Altogether 56,741 individuals with incident AD diagnosis (age ≥ 65 years at diagnosis and born after 1922) and their 1:1 age-, sex-, and region of residence-matched controls were included. The association of risk factors, evaluated at different age periods (45–54, 55–64, 65+), and AD were assessed with logistic regression. Predictive accuracies of logistic regressions were compared with seven machine learning models (L1-regularized logistic regression, Naive bayes, Decision tree, Random Forest, Multilayer perceptron, XGBoost, and LightGBM). Findings63.5 % of cases and controls were females and the mean age was 79.1 (SD = 5.1). The strongest associations with AD were observed for head injuries at age 55–64 (OR, 95 % CI 1.33, 1.19–1.48) and 65+ (1.31, 1.23–1.40), followed by antidepressant use (1.30, 1.22–1.38) at 55–64 and antipsychotic use (1.27, 1.19–1.35) at 65+. The predictive accuracies of all models were low, with the best performance (AUC 0.603) observed in Random Forest for predicting AD onset at age 65–69. InterpretationAlthough significant associations were identified between many risk factors and AD, the low predictive accuracies suggest that specialised healthcare diagnosis data is not sufficient for predicting AD and linkage with other data sources is needed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call