e20033 Background: Real world data (RWD) are increasingly used in oncology research. Yet, a big limitation of RWD is missing data, potentially generating misleading conclusions. Methods for handling missing data include excluding patients with missing variables, using machine-based or statistically-imputed values, or using proxies (surrogates). Another limitation deals with excluding deaths at time zero, which may lead to misleading conclusions when analyzing survival of patients with aggressive cancers. This work highlights the impacts data exclusion, variable surrogacy, and death at time zero have on survival analysis results. Methods: ASCO’s CancerLinQ Discovery Multiple Myeloma (MM) dataset was used to assess overall survival (OS) in patients with MM diagnosed from 2009-2021. Of the 34,234 patients included in the actual analyses, 3,582 (10%) were missing a recorded date of MM diagnosis. In these cases, dates of first anti-myeloma therapy were used as surrogates since most MMs are treated at diagnosis. OS was compared between MM patients with “known” vs. surrogate or “presumed” diagnosis dates. To assess how the inclusion of deaths at time zero may or may not affect OS, the data were first analyzed by excluding patients who died within 1 month of second primary malignancy (SPM) diagnosis, including secondary AML (sAML). A second analysis of the same sample added a constant (0.5) to all survival times, allowing for inclusion of patients who died within one month of diagnosis. Analyses were conducted with STATA Version 17.0 (College Station, TX). Results: Despite the strong, positive correlation between recorded MM diagnosis date and date of first anti-myeloma therapy, there was a statistically significant difference in survival of MM patients with a known vs. presumed date of diagnosis (median OS 115 vs. 45 months, HR 2.54, 95% CI 2.41-2.69, p < 0.001). Dropping vs. including deaths within one month of diagnosis resulted in a marked difference (i.e., nearly 1 year) in median OS from the date of diagnosis of any SPM (113 vs. 103.5 months) as well as sAML (41 vs. 30.5 months). Conclusions: Although RWD hold promise, oncologists must be aware of common pitfalls in survival analyses: missing data, variable surrogates, and deaths at time zero being dropped. Patients with a recorded date of MM diagnosis appear to be fundamentally different from those who don’t have a date of diagnosis but do have a date of anti-myeloma therapy recorded. For aggressive malignancies, excluding patients who died at time zero can lead to over-estimation of survival. Adding a small constant (0.5) to the time variable can enable the inclusion of patients who die quickly after their cancer diagnosis. In conclusion, when utilizing RWD to guide clinical decision making, it is important to be aware of common threats to data validity, which can produce misleading results.
Read full abstract