Abstract

BackgroundAlzheimer’s disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations. Understanding this heterogeneity can enable better treatment, prognosis and disease management. Studies to date have mainly used imaging or cognition data and have been limited in terms of data breadth and sample size. Here we examine the clinical heterogeneity of Alzheimer's disease patients using electronic health records (EHR) to identify and characterise disease subgroups using multiple clustering methods, identifying clusters which are clinically actionable.MethodsWe identified AD patients in primary care EHR from the Clinical Practice Research Datalink (CPRD) using a previously validated rule-based phenotyping algorithm. We extracted and included a range of comorbidities, symptoms and demographic features as patient features. We evaluated four different clustering methods (k-means, kernel k-means, affinity propagation and latent class analysis) to cluster Alzheimer’s disease patients. We compared clusters on clinically relevant outcomes and evaluated each method using measures of cluster structure, stability, efficiency of outcome prediction and replicability in external data sets.ResultsWe identified 7,913 AD patients, with a mean age of 82 and 66.2% female. We included 21 features in our analysis. We observed 5, 2, 5 and 6 clusters in k-means, kernel k-means, affinity propagation and latent class analysis respectively. K-means was found to produce the most consistent results based on four evaluative measures. We discovered a consistent cluster found in three of the four methods composed of predominantly female, younger disease onset (43% between ages 42–73) diagnosed with depression and anxiety, with a quicker rate of progression compared to the average across other clusters.ConclusionEach clustering approach produced substantially different clusters and K-Means performed the best out of the four methods based on the four evaluative criteria. However, the consistent appearance of one particular cluster across three of the four methods potentially suggests the presence of a distinct disease subtype that merits further exploration. Our study underlines the variability of the results obtained from different clustering approaches and the importance of systematically evaluating different approaches for identifying disease subtypes in complex EHR.

Highlights

  • Alzheimer’s disease (AD) is a neurodegenerative disorder which affects 850,000 people in the UK with 95% of cases diagnosed in patients over 65

  • Cohort and data preprocessing We identified 10,065 AD patients and 9124 Unspecified Dementia (UD) patients from 11.3 million patient records that satisfied the cohort criteria, the dataset is 66% female (Table 1, Additional file 1: Fig. 1)

  • Clustering results K‐means The optimal number of clusters identified when using k-means was 5 (Additional file 1: Fig. 2): Anxiety and Depression, Early Onset and Smoking cluster, Nontypical AD cluster, Typical AD cluster, CVD cluster and a cluster of Men with Memory Problems and Cancer (Additional file 1: Fig. 3, Additional file 1: Table 2)

Read more

Summary

Introduction

Background Alzheimer’s disease (AD) is a neurodegenerative disorder which affects 850,000 people in the UK with 95% of cases diagnosed in patients over 65 It is a biologically and Alexander et al BMC Medical Informatics and Decision Making (2021) 21:343 clinically heterogeneous disease which varies in symptoms and rate of progression. In understanding this heterogeneity, it is important to examine the factors which vary and the causes of that variation. Alzheimer’s disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations Understanding this heterogeneity can enable better treatment, prognosis and disease management. We examine the clinical heterogeneity of Alzheimer’s disease patients using electronic health records (EHR) to identify and characterise disease subgroups using multiple clustering methods, identifying clusters which are clinically actionable

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.