Electronic health records (EHRs) comprise a rich source of real-world data for cancer studies, but they often lack critical structured data elements such as diagnosis date and disease stage. Fortunately, such concepts are available from hospital cancer registries. We describe experiences from integrating cancer registry data with EHR and billing data in an interoperable data model across a multisite clinical research network. After sites implemented cancer registry data into a tumor table compatible with the PCORnet Common Data Model (CDM), distributed queries were performed to assess quality issues. After remediation of quality issues, another query produced descriptive frequencies of cancer types and demographic characteristics. This included linked BMI. We also report two current use cases of the new resource. Eleven sites implemented the tumor table, yielding a resource with data for 572,902 tumors. Institutional and technical barriers were surmounted to accomplish this. Variations in racial and ethnic distributions across the sites were observed; the percent of tumors among Black patients ranged from <1% to 15% across sites, and the percent of tumors among Hispanic patients ranged from 1% to 46% across sites. Current use cases include a pragmatic prospective cohort study of a rare cancer and a retrospective cohort study leveraging body size and chemotherapy dosing. Integrating cancer registry data with the PCORnet CDM across multiple institutions creates a powerful resource for cancer studies. It provides a wider array of structured, cancer-relevant concepts, and it allows investigators to examine variability in those concepts across many treatment environments. Having the CDM tumor table in place enhances the impact of the network's effectiveness for real-world cancer research.
Read full abstract