Abstract

Dementia affects approximately 50 million people in the world today, the majority suffering from Alzheimer’s disease (AD). The availability of long-term patient data is one of the most important prerequisites for a better understanding of diseases. Worldwide, many prospective, longitudinal cohort studies have been initiated to understand AD. However, this approach takes years to enroll and follow up with a substantial number of patients, resulting in a current lack of data. This raises the question of whether clinical routine datasets could be utilized to extend collected registry data. It is, therefore, necessary to assess what kind of information is available in memory clinic routine databases. We did exactly this based on the example of the University Hospital Bonn. Whereas a number of data items are available in machine readable formats, additional valuable information is stored in textual documents. The extraction of information from such documents is only applicable via text mining methods. Therefore, we set up modular, rule-based text mining workflows requiring minimal sets of training data. The system achieves F1-scores over 95% for the most relevant classes, i.e., memory disturbances from medical reports and quantitative scores from semi-structured neuropsychological test protocols. Thus, we created a machine-readable core dataset for over 8000 patient visits over a ten-year period.

Highlights

  • For translational medicine, there is a tremendous need for broad, longitudinal healthrelated phenotype data

  • Alzheimer’s disease (AD)-specific patient attributes from clinical routine data of the neurology department with high extraction performance

  • Designing and implementing the workflow that can cope with a heterogeneous clinical data warehouse required identifying and solving many hurdles: first, we needed to identify what kind of information is stored in which resource; second, we found the database fields that include the relevant semi-structured data; third, for the textual documents, we had to assess if the information can be found and extracted reliably; and fourth, we had to determine how the training and test datasets could be generated while at the same time complying with data privacy regulations

Read more

Summary

Introduction

There is a tremendous need for broad, longitudinal healthrelated phenotype data. This holds true especially for Alzheimer’s disease (AD), for which an etiology of decades is assumed, in major parts without clinical symptoms. Diseases (DZNE) has started large cohort studies, such as the DZNE Longitudinal Cognitive Impairment and Dementia Study (DELCODE) [4] and the DZNE Clinical Registry. Study of Neurodegenerative Diseases (DESCRIBE) [5] Despite all these efforts, it will take years to collect large subject groups and longitudinal data. It will take years to collect large subject groups and longitudinal data These cohorts follow their own, often unharmonized, study designs—they might be biased toward their particular study designs

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call