We present, to our knowledge, the first methodological study aimed at enhancing the prognosticpower of Cox regression models, widely used in survival analysis, through optimized data selection. Ourapproach employs a novel two-stage mechanism: by framing the prognostic stratum matching problemintuitively, we select prognostically representative patient observations to create a more balanced trainingset. This enables the model to assign equal attention to distinct prognostic subgroups. We demonstratethe methodology using an observational dataset of 1,799 patients with resected colorectal cancer livermetastases, 1,197 of whom received adjuvant chemotherapy and 602 who did not. In our study, as is current standard practice, the comparator was training prognostic models on the entire cohort (referred to as "model 1"). Models trained on the untreated and treated subgroups, matched through our approach (referred to as "model 3A and 3B", respectively), showed an improvement of up to 20% in bootstrapped C-indices compared tomodel 1. Notably, model 3 exhibited superior calibration, with a 6- to 10-fold improvement over model 1. Additional performance metrics aligned with these findings, and robustness was confirmed through biascorrectedbootstrapping. Given the ongoing development of numerous linear prognostic models and thegeneral applicability of our approach to any observational data, this method holds significant potentialto impact biomedical research and clinical practice where prognostic models are utilized.
Read full abstract