Abstract

Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call