Abstract

BackgroundHigh-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling.MethodsA group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices.ResultsWe present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing.ConclusionsThere are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.

Highlights

  • We present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling

  • As a result of the digitization of health systems worldwide, electronic health record (EHR) data repositories have emerged as the main source of data for medical cohort research studies

  • To determine the functionality that should be provided by a next-generation phenotype library, a team of international researchers—comprising Health Data Research UK (HDR UK) Phenomics theme members and US researchers from the Mobilizing Computable Biomedical Knowledge (MCBK) and Phenotype Execution and Modelling Architecture (PhEMA) communities—first examined a range of tools supporting different parts of the definition lifecycle, which were developed within their respective phenomics communities

Read more

Summary

Introduction

As a result of the digitization of health systems worldwide, electronic health record (EHR) data repositories have emerged as the main source of data for medical cohort research studies. While traditional big data techniques can successfully address the scale of the EHR data available, the effectiveness of phenotype definitions is affected by a range of other syntactic and semantic issues, including variations in the way data are structured and the coding systems used To overcome these issues and enable effective cohort extraction, a phenotype definition must exhibit certain properties. Libraries should directly validate the definitions they host, through, for example, automated comparisons with gold standards To this end, in this work we contribute a number of desiderata for the development of phenotype libraries, which ensure that definitions are accessible and maximize the quality of the phenotypes they contain by supporting all parts of the definition lifecycle. By providing access to high-quality definitions, phenotype libraries enable both efficient and accurate use of EHR data for activities such as medical research, decision support, and clinical trial recruitment

Background
Methods
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.