Abstract

PURPOSEThe American Association for Cancer Research Project Genomics Evidence Neoplasia Information Exchange Biopharma Collaborative is a multi-institution effort to build a pan-cancer repository of genomic and clinical data curated from the electronic health record. For the research community to be confident that data extracted from electronic health record text are reliable, transparency of the approach used to ensure data quality is essential.MATERIALS AND METHODSFour institutions participating in AACR's Project GENIE created an observational cohort of patients with cancer for whom tumor molecular profiling data, therapeutic exposures, and treatment outcomes are available and will be shared publicly with the research community. A comprehensive approach to quality assurance included assessments of (1) feasibility of the curation model through pressure test cases; (2) accuracy through programmatic queries and comparison with source data; and (3) reproducibility via double curation and code review.RESULTSAssessments of feasibility resulted in critical modifications to the curation directives. Queries and comparison with source data identified errors that were rectified via data correction and curator retraining. Assessment of intercurator reliability indicated a reliable curation model.CONCLUSIONThe transparent quality assurance processes for the GENIE BPC data ensure that the data can be used for analyses that support clinical decision making and advances in precision oncology.

Highlights

  • The future of precision medicine in oncology requires detailed patient data alongside molecular characterization of tumors to allow for discovery, risk stratification and to inform the selection of optimal therapy.[1]

  • The transparent quality assurance processes for the GENIE BPC data ensure that the data can be used for analyses that support clinical decision making and advances in precision oncology

  • Efforts such as The Cancer Genome Atlas have molecularly characterized more than 200,000 primary cancer tumors and have led to insights regarding the genomic landscape of many cancers, phenomic data that includes clinical characteristics, therapeutic exposures, and salient outcomes are limited in The Cancer Genome Atlas and similar data sources.[2]

Read more

Summary

Introduction

The future of precision medicine in oncology requires detailed patient data alongside molecular characterization of tumors to allow for discovery, risk stratification and to inform the selection of optimal therapy.[1]. With more treatments available and the need to make more rapid treatment decisions, intermediate end points such as progression-free survival (PFS) are routinely used for patients treated as part of clinical trials. With the majority of patient care occurring outside of the clinical trial setting, there is a need for robust curation of intermediate end points such as real-world progression-free survival. Much of the critical data characterizing end points such as treatment duration, toxicity, progression, and recurrence are not captured using structured data fields in the electronic health record (EHR), posing challenges for the collection and synthesis of these data.[3]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call