High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).

Yichi Zhang,Kelly Cho,Nicholas Link,David Gagnon,Zongqi Xia,Jie Huang,Tianrun Cai,Shawn N Murphy,Jacqueline Honerlaw,Sicong Huang,Sheng Yu,Ashwin N Ananthakrishnan,Katherine P Liao,Guergana Savova,Tianxi Cai,J Michael Gaziano,Elizabeth W Karlson,Chuan Hong,Stanley Y Shaw,Jiehuan Sun,Peter Szolovits,Susanne Churchill,Robert M Plenge,Vivian S Gainer ,Víctor M Castro ,Isaac S Kohane ,Yuk‐Lam Ho ,Christopher J O’donnell

doi:10.1038/s41596-019-0227-6

Abstract

Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).

Abstract

Talk to us

Similar Papers

More From: Nature Protocols

Lead the way for us

Journal: Nature Protocols	Publication Date: Nov 20, 2019
Citations: 115

Similar Papers

Illustrating the patient journey through the care continuum: Leveraging structured primary care electronic medical record (EMR) data in Ontario, Canada using chronic obstructive pulmonary disease as a case study
Jennifer Rayner ... Chen Wu
International Journal of Medical Informatics | VOL. 140
Jennifer Rayner, et. al.Jennifer Rayner ... Chen Wu
19 May 2020
International Journal of Medical Informatics | VOL. 140

Can Linked Electronic Medical Record and Administrative Data Help Us Identify Those Living with Frailty?
Sabrina Wong ... Tyler Williamson
International journal of population data science | VOL. 5
Sabrina Wong, et. al.Sabrina Wong ... Tyler Williamson
14 Oct 2020
International journal of population data science | VOL. 5

OP0010 Use of claims and electronic medical record data to predict ra disease activity
C.H Feldman ... M.E Weinblatt
Annals of the Rheumatic Diseases | VOL. 77
C.H Feldman, et. al.C.H Feldman ... M.E Weinblatt
01 Jun 2018
OP0010 Use of claims and electronic medical record data to predict ra disease activity
C.H Feldman ... M.E Weinblatt

Prediction Accuracy With Electronic Medical Records Versus Administrative Claims.
Dan Zeltzer ... Ran D Balicer
Medical Care | VOL. 57
Dan Zeltzer, et. al.Dan Zeltzer ... Ran D Balicer
01 Jul 2019
Medical Care | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).

Abstract

Talk to us

Similar Papers

More From: Nature Protocols