Abstract

Electronic health records (EHRs) are widely adopted with a great potential to serve as a rich, integrated source of phenotype information. Computational phenotyping, which extracts phenotypes from EHR data automatically, can accelerate the adoption and utilization of phenotype-driven efforts to advance scientific discovery and improve healthcare delivery. A list of computational phenotyping algorithms has been published but data fragmentation, i.e., incomplete data within one single data source, has been raised as an inherent limitation of computational phenotyping. In this study, we investigated the impact of diverse data sources on two published computational phenotyping algorithms, rheumatoid arthritis (RA) and type 2 diabetes mellitus (T2DM), using Mayo EHRs and Rochester Epidemiology Project (REP) which links medical records from multiple health care systems. Results showed that both RA (less prevalent) and T2DM (more prevalent) case selections were markedly impacted by data fragmentation, with positive predictive value (PPV) of 91.4 and 92.4%, false-negative rate (FNR) of 26.6 and 14% in Mayo data, respectively, PPV of 97.2 and 98.3%, FNR of 5.2 and 3.3% in REP. T2DM controls also contain biases, with PPV of 91.2% and FNR of 1.2% for Mayo. We further elaborated underlying reasons impacting the performance.

Highlights

  • The increased availability of Electronic health records (EHRs) fostered by the HITECH Act has a great potential to serve as a rich, integrated source of phenotype information (Denny et al, 2011; Crawford et al, 2014)

  • Factors Impacting Computational Phenotyping covering over fifty diseases (phenotype knowledgebase (PheKB), 2019) including rheumatoid arthritis (RA) (Liao et al, 2010; Partners Phenotyping Group, 2016) and type 2 diabetes mellitus (T2DM) are available at the PheKB, primarily developed through the electronic medical records and genomics (eMERGE) Network

  • Using both Mayo Clinic EHRs and Rochester Epidemiology Project (REP) data, we identified 620 RA cases (42,319 controls) and 5,215 T2DM cases (6,293 controls) to serve as our benchmark for the analyses

Read more

Summary

Introduction

The increased availability of EHRs fostered by the HITECH Act has a great potential to serve as a rich, integrated source of phenotype information (Denny et al, 2011; Crawford et al, 2014). Critical to this effort is computational phenotyping, which identifies patients with certain conditions of interest from EHR data (Gunasekar et al, 2016). Developing research quality EHR-based computational phenotyping algorithms that categorize disease or traits in complete populations is not an easy task, as the primary purpose of EHR data is for healthcare delivery and reimbursement practices (Wei and Denny, 2015). Using data from two healthcare institutions, the study demonstrated that running a T2DM phenotyping algorithm, developed by researchers from Northwestern University, on data from a single institution missed almost one third of the T2DM cases (Wei et al, 2012)

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.