Abstract

ObjectiveThe aim of the study was to transform a resource of linked electronic health records (EHR) to the OMOP common data model (CDM) and evaluate the process in terms of syntactic and semantic consistency and quality when implementing disease and risk factor phenotyping algorithms.Materials and MethodsUsing heart failure (HF) as an exemplar, we represented three national EHR sources (Clinical Practice Research Datalink, Hospital Episode Statistics Admitted Patient Care, Office for National Statistics) into the OMOP CDM 5.2. We compared the original and CDM HF patient population by calculating and presenting descriptive statistics of demographics, related comorbidities, and relevant clinical biomarkers.ResultsWe identified a cohort of 502 536 patients with the incident and prevalent HF and converted 1 099 195 384 rows of data from 216 581 914 encounters across three EHR sources to the OMOP CDM. The largest percentage (65%) of unmapped events was related to medication prescriptions in primary care. The average coverage of source vocabularies was >98% with the exception of laboratory tests recorded in primary care. The raw and transformed data were similar in terms of demographics and comorbidities with the largest difference observed being 3.78% in the prevalence of chronic obstructive pulmonary disease (COPD).ConclusionOur study demonstrated that the OMOP CDM can successfully be applied to convert EHR linked across multiple healthcare settings and represent phenotyping algorithms spanning multiple sources. Similar to previous research, challenges mapping primary care prescriptions and laboratory measurements still persist and require further work. The use of OMOP CDM in national UK EHR is a valuable research tool that can enable large-scale reproducible observational research.

Highlights

  • AND SIGNIFICANCEThe combination of electronic health record data with large biobank cohort studies has scaled the breadth and depth of genetic discoveries to identify hundreds of thousands of novel associations between variants and phenotypes derived from electronic health records (EHR) through analyses such as phenome-wide association studies (PheWAS).[3]

  • common data model (CDM), such as the OMOP CDM,[5] managed by the Observational Health Data Science and Informatics (OHDSI) community, or the PCORNet CDM,[6] enable researchers to integrate and analyze information contained in disparate observational data sources by mapping data into a common format with a robust specification

  • We mapped 109 772 terms across five controlled clinical terminologies used in the source EHR data for diagnoses, procedures, observations, measurements, deaths, devices, and medication to CDM Concepts (Table 1)

Read more

Summary

Introduction

AND SIGNIFICANCEThe combination of electronic health record data with large biobank cohort studies (eg UK Biobank,[1] eMERGE2) has scaled the breadth and depth of genetic discoveries to identify hundreds of thousands of novel associations between variants and phenotypes (and endotypes) derived from EHR through analyses such as phenome-wide association studies (PheWAS).[3]. In the United States and elsewhere, researchers have converted EHR and claims data to the OMOP CDM to enable federated analyses of disparate sources of information.[7,8] In the United Kingdom, the Clinical Research Practice Research Datalink (CPRD)[9] and The Health Improvement Network (THIN)[10] have been converted to the OMOP CDM In both of these cases, extensive transformations were performed to map bespoke data provider formats and UK-specific clinical terminologies and researchers evaluated the quality of the CDM in terms of replicating existing epidemiological analyses performed in the raw data sources

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call