Introduction: Real-world data (RWD) is being increasingly used to support clinical and regulatory decisions where utilization of clinical trial data is difficult or impractical. With this expanded utilization comes a need to ensure RWD is of sufficient quality and completeness to adequately answer relevant research questions. Given the importance of overall survival (OS) for oncology research, treatment and care, complete and accurate real-world mortality data capture is of critical importance. Currently the industry standard practice aggregates mortality data across available data sources to maximize data capture. This study sought to validate and benchmark a composite real-world mortality variable against the gold standard data source from the National Death Index (NDI). Methods: This observational study was conducted using the COTA real-world database, a de-identified database derived from the electronic health records (EHRs) of partnered healthcare providers in the United States. Patients diagnosed with select cancers between January 1st, 2015 and December 31st, 2020 were included. Cancer types included in this study were: acute myeloid leukemia (AML), chronic lymphocytic leukemia (CLL), diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), marginal zone B-cell lymphoma (MZL), multiple myeloma (MM), and myelodysplastic syndrome (MDS). COTA's composite mortality variable utilizes structured and unstructured data from the EHRs of COTA partnered healthcare provider sites and commercially available obituary data including the Social Security Administration death master file. To conduct the validation study, COTA records were matched to NDI records using an NDI-developed matching algorithm. Patients without an NDI match were considered alive according to the NDI. Validation metrics were calculated in the overall population and by subgroup including cancer type and key demographic characteristics. Metrics assessed included sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and date concordance (exact, ± 7 days, ± 15 days, ± 30 days). Corresponding 95% confidence intervals were estimated using the Wilson method for calculating confidence intervals for proportions. Results: The final study population included 21,567 patients across 7 cancer types (AML N=2638, CLL N=2502, DLBCL N=6066, FL N=2835, MM N=4234, MDS N=2295, and MZL N=997). Within the overall study population, validation analysis comparing the real-world composite mortality variable to the NDI demonstrated high sensitivity (87.8), specificity (95.7), PPV (90.9) and NPV (94.1). (Table 1) Some variability in validation metrics by disease was observed with MM having the highest sensitivity (90.4) and FL having the highest specificity (99.0). Sensitivity and specificity across all cancer types were greater than 80%. Patients 18-29 years of age had the lowest sensitivity (75.0), while patients 90 years of age or greater had the highest sensitivity (92.7), but lowest specificity (77.1). Validation metrics were similar across sex, race, and year of diagnosis subgroups. Date concordance between the real-world composite mortality variable and NDI dates of death was high. In the study population (N=21,567), exact date concordance was observed in 88.0% of patients, and concordance rates for 7-, 15- and 30-day intervals were 93.1%, 93.8%, and 94.3%, respectively. (Table 2) Rates of concordance were consistent across disease subtypes, subgroups, and date intervals. Conclusions: This study found that a composite real-world mortality variable yielded strong performance in comparison to the gold-standard NDI database, as measured by the validation metrics of sensitivity, specificity, PPV and NPV. Additionally, high rates of date concordance were observed across the exact, 7-, 15- and 30-day intervals. These findings are critical to ensure the reliability of results generated using RWD, and further establish the use of a composite real-world mortality variable as best practice. Future research will expand this validation methodology to a cohort of patients with solid tumor cancers, as well as investigate the performance of overall survival using a composite real-world mortality variable.
Read full abstract