Abstract

e18775 Background: The production of high-quality real-world data requires comprehensive and meticulous data quality assurance (QA) methods to guarantee that adequate standards of accuracy, completeness, and consistency are met. Memorial Sloan Kettering Cancer Center (MSKCC) synthesizes manually curated Electronic Health Record (EHR) data to collect and harmonize the fundamental data elements across all cancer types. Centralized real-time analysis of curated data quality can allow for rigorous review to identify areas of strength and opportunities for improvement in the curation process. Methods: MSKCC built the Core Clinical Data Element (CCDE) data model, which encompasses aspects of PRISSMM, ASCO’s mCODE, and NAACCR tumor registry frameworks, to capture standardized real-world, pan-cancer, pan-specialty data across 11 modules, including cancer genomics, imaging, pathology, surgery, and radiation. A key component within the QA process is source data verification (SDV), the comparison of curated data against source documents to identify inconsistencies. Any discrepancies detected are classified into major and minor violations. Major violations are errors or omissions on core data elements that would impact time interval calculations, such as an incorrect procedure date. Minor violations are errors or omissions on less critical data elements, such as a missing radiation therapy dose. Identifying these inconsistences allows the QA team to recognize patterns in curation errors and distinguish areas for curator retraining. Results: With limited functionality in basic standard data quality checks that exist across various data storage platforms, an interactive application was developed using the R Shiny package to access data as cases are recorded and summarize findings from SDV in real time. The app has two panels, each stratified by CCDE module. The first panel details the total number of forms curated and percentage of forms that underwent SDV, with each form representing one of the 11 modules. The other panel consists of a set of tables that summarize specific major and minor violations based on user selection of a denominator of either patients (e.g. how many patients had a violation on at least one imaging report) or forms (e.g. how many imaging reports had a violation). We will demonstrate the utility of the app and discuss benefits of real time evaluation in large-scale, real-world EHR curation efforts. Conclusions: We recommend automated, user-friendly tools to assess data quality of such efforts. With real-time analysis, the tool allows for ongoing and regular data checks, enabling clarification of directives and retraining of curators as necessary early in the curation process. As the data curation efforts expand to more cancer cohorts, the app examines data quality of each cohort to ensure consistent evaluation. This offers transparency of data quality to ensure usability in real-world data for rigorous research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call