Abstract

BackgroundElectronic Health Records (EHRs) contain a wealth of information useful for studying clinical phenotype-genotype relationships. Severity is important for distinguishing among phenotypes; however other severity indices classify patient-level severity (e.g., mild vs. acute dermatitis) rather than phenotype-level severity (e.g., acne vs. myocardial infarction). Phenotype-level severity is independent of the individual patient’s state and is relative to other phenotypes. Further, phenotype-level severity does not change based on the individual patient. For example, acne is mild at the phenotype-level and relative to other phenotypes. Therefore, a given patient may have a severe form of acne (this is the patient-level severity), but this does not effect its overall designation as a mild phenotype at the phenotype-level.MethodsWe present a method for classifying severity at the phenotype-level that uses the Systemized Nomenclature of Medicine – Clinical Terms. Our method is called the Classification Approach for Extracting Severity Automatically from Electronic Health Records (CAESAR). CAESAR combines multiple severity measures – number of comorbidities, medications, procedures, cost, treatment time, and a proportional index term. CAESAR employs a random forest algorithm and these severity measures to discriminate between severe and mild phenotypes.ResultsUsing a random forest algorithm and these severity measures as input, CAESAR differentiates between severe and mild phenotypes (sensitivity = 91.67, specificity = 77.78) when compared to a manually evaluated reference standard (k = 0.716).ConclusionsCAESAR enables researchers to measure phenotype severity from EHRs to identify phenotypes that are important for comparative effectiveness research.

Highlights

  • Electronic Health Records (EHRs) contain a wealth of information useful for studying clinical phenotype-genotype relationships

  • Assessment of phenotype severity Severe phenotypes in general are more prevalent in EHRs because in-patient records contain “sicker” individuals when compared to the general population, which can introduce something called the Berkson bias [36]

  • For condition/phenotype information we used data from Columbia University Medical Center (CUMC) EHRs, which was initially recorded using ICD-9 codes. These ICD-9 codes were mapped to SNOMED-CT codes using the Observational Medical Outcomes Partnership (OMOP) Clinical Data Model (CDM) v.4 [2]

Read more

Summary

Introduction

Electronic Health Records (EHRs) contain a wealth of information useful for studying clinical phenotype-genotype relationships. Many national and international organizations were formed to study clinically meaningful Health Outcomes of Interest (HOIs). This included the Observational Medical Outcomes Partnership (OMOP), which standardized HOI. Multiple hypothesis correction methods aim to reduce the false positive rate. These methods strongly penalize for a large phenotype selection space. A method is needed that efficiently reduces the phenotype selection space to only include important phenotypes. This would reduce the number of false positives in our results and allow us to prioritize phenotypes for CER and rank them by severity

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call