Abstract

The use of primary care electronic health records for research is abundant. The benefits gained from utilising such records lies in their size, longitudinal data collection and data quality. However, the use of such data to undertake high quality epidemiological studies, can lead to significant challenges particularly in dealing with misclassification, variation in coding and the significant effort required to pre-process the data in a meaningful format for statistical analysis. In this paper, we describe a methodology to aid with the extraction and processing of such databases, delivered by a novel software programme; the “Data extraction for epidemiological research” (DExtER). The basis of DExtER relies on principles of extract, transform and load processes. The tool initially provides the ability for the healthcare dataset to be extracted, then transformed in a format whereby data is normalised, converted and reformatted. DExtER has a user interface designed to obtain data extracts specific to each research question and observational study design. There are facilities to input the requirements for; eligible study period, definition of exposed and unexposed groups, outcome measures and important baseline covariates. To date the tool has been utilised and validated in a multitude of settings. There have been over 35 peer-reviewed publications using the tool, and DExtER has been implemented as a validated public health surveillance tool for obtaining accurate statistics on epidemiology of key morbidities. Future direction of this work will be the application of the framework to linked as well as international datasets and the development of standardised methods for conducting electronic pre-processing and extraction from datasets for research purposes.

Highlights

  • Advancements in technology and healthcare systems has enabled large-scale collection of longitudinal electronic health records [1]

  • We introduce Data extraction for epidemiological research” (DExtER), an extract transform load (ETL) based software framework that enables automated clinical epidemiological studies (ACES), in a reproducible and verifiable way

  • For each code entity with exclusion criteria if the exposure type is ○ Exclude if ever recorded: find the event described by the code entity, if the entity is found exclude patient and record documentation for rejection we explored the association between Type 1 Diabetes and subsequent risk of developing epilepsy using a cohort study design [49]

Read more

Summary

Introduction

Advancements in technology and healthcare systems has enabled large-scale collection of longitudinal electronic health records [1]. In the UK, there are many primary care databases (THIN, CPRD, QResearch and ResearchOne) of anonymised patient records [2, 3]. The data within primary care databases are derived from healthcare software system used to manage patient’s clinical data [7, 8]. These systems are designed for the end-user experience of helping healthcare professionals to access and manage clinical data rather than for research purposes. As such, these datasets present several challenges

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call