Abstract

AbstractBackgroundFrontotemporal dementia (FTD) is a neurodegenerative syndrome characterized by distinct subtypes, each with unique clinical presentations and varying neuropathologic features. Although FTD is highly heritable, it is frequently undiagnosed and genetic causes of FTD are rarely determined, limiting what is known about the medical phenome associated with FTD. Here, we perform a combination of innovative strategies to improve detection of FTD in a large electronic health record (EHR) using machine‐learning (ML) and uncover genetic factors for FTD.Method123 FTD cases and 615 matched controls were identified via diagnostic codes and confirmed by manual expert review of medical notes in Vanderbilt University Medical Center’s (VUMC) EHR. We then applied a feature selection algorithm which calculated the frequency for each phecode within a randomly generated and demographically matched control set. A p‐value denoting the enrichment of each phecode was calculated based on the observed phecode frequency relative to the permutation‐derived null distribution. 28 phecodes were enriched in FTD cases compared to controls. Top predictors included “Neurological disorders”, “Memory loss”, Aphasia”, and “Parkinson’s disorder”. We then developed a classification and regression tree model that leveraged enriched phecodes to label an undiagnosed case set in VUMC’s EHR‐linked DNA databank, BioVU. The model was trained on 96 cases and 481 controls, with 28 cases and 139 controls reserved for testing.ResultOf the testing set, 19 of the FTD cases were classified as cases and 5 were misclassified as controls. 135 controls were classified as such and 4 were misclassified as cases, indicating 82% PPV. Application of this model in BioVU resulted in an expanded case set of inferred FTD, which is currently being leveraged in genomic analyses to identify causal variants. We have identified a genomic segment on chromosome 1 (146.9MB‐147.3MB) spanning ACP6 and GJA5 that is shared in significant excess among FTD cases. These genes have been associated with intellectual disability, aphasia, and autism.ConclusionOur results demonstrate both the potential of machine learning‐based approaches for case acquisition in dementia subtypes that are often underdiagnosed in medical record databases and illustrate how inferred cases can be leveraged for novel gene discovery.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call