Chapter 13: Mining Electronic Health Records in the Genomics Era

Joshua C Denny

doi:10.1371/journal.pcbi.1002823

Joshua C Denny

Open Access

https://doi.org/10.1371/journal.pcbi.1002823

Copy DOI

Journal: PLoS Computational Biology	Publication Date: Dec 27, 2012
Citations: 222	License type: CC BY 4.0

Affiliation: Vanderbilt University

Abstract

: The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR) systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DNA biobanks to enable genomic and pharmacogenomic research, using EHR data for phenotypic information. However, EHRs are designed primarily for clinical care, not research, so reuse of clinical EHR data for research purposes can be challenging. Difficulties in use of EHR data include: data availability, missing data, incorrect data, and vast quantities of unstructured narrative text data. Structured information includes billing codes, most laboratory reports, and other variables such as physiologic measurements and demographic information. Significant information, however, remains locked within EHR narrative text documents, including clinical notes and certain categories of test results, such as pathology and radiology reports. For relatively rare observations, combinations of simple free-text searches and billing codes may prove adequate when followed by manual chart review. However, to extract the large cohorts necessary for genome-wide association studies, natural language processing methods to process narrative text data may be needed. Combinations of structured and unstructured textual data can be mined to generate high-validity collections of cases and controls for a given condition. Once high-quality cases and controls are identified, EHR-derived cases can be used for genomic discovery and validation. Since EHR data includes a broad sampling of clinically-relevant phenotypic information, it may enable multiple genomic investigations upon a single set of genotyped individuals. This chapter reviews several examples of phenotype extraction and their application to genetic research, demonstrating a viable future for genomic discovery using EHR-linked data.

Highlights

Introduction and MotivationTypical genetic research studies have used purpose-built cohorts or observational studies for genetic research
Unavailability may result from clinics that are slow adopters, have very high patient volumes, or have specific workflows not well accommodated by the electronic health record (EHR) system [25]
EHRs have long been seen as a vehicle to improve healthcare quality, cost, and safety

Summary

Introduction and Motivation

Typical genetic research studies have used purpose-built cohorts or observational studies for genetic research. Rare diseases may take a significant time to accrue in these datasets Another model that is gaining acceptance is genetic discovery based solely or partially from phenotype information derived solely from the electronic health record (EHR) [6]. Both study designs share costs for obtaining and storing DNA Another advantage of EHR-linked DNA databanks is the potential to reuse genetic information to investigate a broad range of additional phenotypes beyond the original study. This is true for dense genetic data such as generated through genome-wide association studies or large-scale sequencing data. Another example is the Kaiser Permanente Research Program on Genes, Environment and Health, which has genotyped 100,000 members with linked EHR data [8]

Classes of Data Available in EHRs

Summary

Documentation from Reports and Tests

Natural Language Processing to Support Clinical Knowledge Extraction

EHR-Associated Biobanks

Race and Ethnicity in EHRDerived Biobanks

Measure of Phenotype Selection Logic Performance

Creation of Phenotype Selection Logic

Methods

Examples of Genetic Discovery Using EHRs

Replicating Known Genetic Associations for Five Diseases

Demonstrating Multiethnic

Early Genome-Wide Association Studies from the eMERGE Network

Conclusions and Future Directions

Findings

Exercises

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chapter 13: Mining Electronic Health Records in the Genomics Era

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology

Lead the way for us

Similar Papers

The utility of electronic health record data for identifying postpartum hemorrhage
Mark A Clapp ... Anjali J Kaimal
American Journal of Obstetrics & Gynecology MFM | VOL. 3
Mark A Clapp, et. al.Mark A Clapp ... Anjali J Kaimal
06 Jan 2021
American Journal of Obstetrics & Gynecology MFM | VOL. 3

Agreement of Medicaid claims and electronic health records for assessing preventive care quality among adults.
John Heintzman ... Steffani R Bailey
Journal of the American Medical Informatics Association : JAMIA | VOL. 21
John Heintzman, et. al.John Heintzman ... Steffani R Bailey
07 Feb 2014
Journal of the American Medical Informatics Association : JAMIA | VOL. 21

Application of An Ontology for Characterizing Data Quality For a Secondary Use of EHR Data.
Vipin Kumar ... Steven Johnson
Applied Clinical Informatics | VOL. 7
Vipin Kumar, et. al.Vipin Kumar ... Steven Johnson
01 Jan 2015
Applied Clinical Informatics | VOL. 7

Abstract 378: Accuracy of EHR-based Computable Phenotypes to Identify Cardiovascular Clinical Events and Patient Comorbidities
Jedrek Wosik ... Greg Flaker
Circulation: Cardiovascular Quality and Outcomes | VOL. 13
Jedrek Wosik, et. al.Jedrek Wosik ... Greg Flaker
01 May 2020
Circulation: Cardiovascular Quality and Outcomes | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chapter 13: Mining Electronic Health Records in the Genomics Era

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology