In the United States, a comprehensive national breast cancer registry (CR) does not exist. Thus, care and coverage decisions are based on data from population subsets, other countries, or models. We report a prototype real-world research data mart to assess mortality, morbidity, and costs for breast cancer diagnosis and treatment. With institutional review board approval and Health Insurance Portability and Accountability Act (HIPPA) compliance, a multidisciplinary clinical and research data warehouse (RDW) expert group curated demographic, risk, imaging, pathology, treatment, and outcome data from the electronic health records (EHR), radiology (RIS), and CR for patients having breast imaging and/or a diagnosis of breast cancer in our institution from January 1, 2004, to December 31, 2020. Domains were defined by prebuilt views to extract data denormalized according to requirements from the existing RDW using an export, transform, load pattern. Data dictionaries were included. Structured query language was used for data cleaning. Five-hundred eighty-nine elements (EHR 311, RIS 211, and CR 67) were mapped to 27 domains; all, except one containing CR elements, had cancer and noncancer cohort views, resulting in a total of 53 views (average 12 elements/view; range, 4-67). EHR and RIS queries returned 497,218 patients with 2,967,364 imaging examinations and associated visit details. Cancer biology, treatment, and outcome details for 15,619 breast cancer cases were imported from the CR of our primary breast care facility for this prototype mart. Institutional real-world data marts enable comprehensive understanding of care outcomes within an organization. As clinical data sources become increasingly structured, such marts may be an important source for future interinstitution analysis and potentially an opportunity to create robust real-world results that could be used to support evidence-based national policy and care decisions for breast cancer.
Read full abstract