Evaluation of machine learning approaches for estimating individualized treatment regimens for time-to-event outcomes in observational studies

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

In this paper we provide an overview and evaluation of machine learning methods for estimating individualized treatment regimens (ITR) for time-to-event outcomes through maximizing restricted mean survival time (RMST) in observational studies with non-randomized treatment assignment. We present extensive simulation studies that closely mimicked real-world data under a set of scenarios representing different degrees of alignment between the observed regimen, which reflects the actual prescribing practice, and the optimal ITR. The simulation results include performance characteristics of the candidate methods in terms of their ability to recover the optimal ITR and various empirical measures of RMST gain based on the comparison between the estimated ITR and the actual prescribing practice. Direct methods for estimating ITR for survival outcomes did not show advantages over indirect methods based on predicted potential outcomes under our implementation and simulation settings. Among indirect methods, gradient boosting for estimating potential survival outcomes has an advantage over random survival forests and parametric methods. Estimating the value of estimated ITRs and associated gains over the actual prescribing practice remains a challenging problem, especially under complex confounding scenarios when either most patients do not receive optimal treatment or when the actual treatment assignments are already close to optimal.

Similar Papers
  • Abstract
  • 10.1182/blood-2023-188855
Azacitidine Treatment in MDS: A Systematic Literature Review and Meta-Analysis Comparing the Efficacy of Real World Data with Randomized Controlled Trials
  • Nov 2, 2023
  • Blood
  • Livia Rupp + 3 more

Azacitidine Treatment in MDS: A Systematic Literature Review and Meta-Analysis Comparing the Efficacy of Real World Data with Randomized Controlled Trials

  • Research Article
  • Cite Count Icon 27
  • 10.1001/jamaophthalmol.2020.5743
Random Survival Forests Analysis of Intraoperative Complications as Predictors of Descemet Stripping Automated Endothelial Keratoplasty Graft Failure in the Cornea Preservation Time Study
  • Dec 23, 2020
  • JAMA Ophthalmology
  • Robert C O’Brien + 3 more

A new analytic method can evaluate factors of interest associated with graft failure after Descemet stripping automated endothelial keratoplasty (DSAEK) or more generally in any ophthalmic surgical setting with a time-to-event outcome. To reanalyze types of intraoperative complications associated with DSAEK graft failure in the Cornea Preservation Time Study using random survival forests. This cohort study, initially conceived in April 2019, used a prediction model to conduct a post hoc secondary analysis of data collected in a multicenter, double-masked, randomized clinical trial. Forty US clinical sites with 70 surgeons participated, with donor corneas provided by 23 US eye banks. The study included 1090 participants, representing 1330 eyes, undergoing DSAEK for Fuchs dystrophy (1255 eyes [94.4%]) or pseudophakic or aphakic corneal edema (75 eyes [5.6%]). Enrollment occurred between April 16, 2012, and February 20, 2014, and follow-up ended June 5, 2017. Statistical analysis was performed from July 10, 2019, to May 29, 2020. Descemet stripping automated endothelial keratoplasty with random assignment of a donor cornea with preservation time of 7 days or less or 8 to 14 days. Ranked variable importance for intraoperative complications among 50 donor, recipient, and eye bank variables and restricted mean survival time through 47 months (1434 days) after DSAEK were examined. Random survival forests, a nonparametric method (with less restrictive model assumptions) that is far more flexible in its ability to model nonlinear effects and interactions, was used to analyze the data. This study included 1090 participants (663 women [60.8%]; median age, 70 years [range, 42-90 years]), representing 1330 eyes. Random survival forests ranked a DSAEK intraoperative complication as the third most predictive factor of graft failure, after surgeon and eye bank, in the final model with 5 predictors. In the first 47 months after DSAEK, the estimated mean difference in restricted mean survival time for grafts that experienced a DSAEK intraoperative complication vs those that did not was -227 days (99% CI, -352 to -70 days) based on the final RSF model. These findings, while post hoc, support the hypothesis that random survival forests allow for an improved analytic approach for identifying factors predictive of graft failure and for obtaining adjusted graft survival estimates. Random survival forests offer the opportunity to guide the development of future population-based cohort ophthalmic surgical studies, establishing definitive factors for procedural success.

  • Research Article
  • Cite Count Icon 13
  • 10.1080/10543406.2017.1380036
Estimating the Optimal Personalized Treatment Strategy Based on Selected Variables to Prolong Survival via Random Survival Forest with Weighted Bootstrap
  • Oct 25, 2017
  • Journal of Biopharmaceutical Statistics
  • Jincheng Shen + 5 more

ABSTRACTA personalized treatment policy requires defining the optimal treatment for each patient based on their clinical and other characteristics. Here we consider a commonly encountered situation in practice, when analyzing data from observational cohorts, that there are auxiliary variables which affect both the treatment and the outcome, yet these variables are not of primary interest to be included in a generalizable treatment strategy. Furthermore, there is not enough prior knowledge of the effect of the treatments or of the importance of the covariates for us to explicitly specify the dependency between the outcome and different covariates, thus we choose a model that is flexible enough to accommodate the possibly complex association of the outcome on the covariates. We consider observational studies with a survival outcome and propose to use Random Survival Forest with Weighted Bootstrap (RSFWB) to model the counterfactual outcomes while marginalizing over the auxiliary covariates. By maximizing the restricted mean survival time, we estimate the optimal regime for a target population based on a selected set of covariates. Simulation studies illustrate that the proposed method performs reliably across a range of different scenarios. We further apply RSFWB to a prostate cancer study.

  • Research Article
  • Cite Count Icon 41
  • 10.1080/01621459.2014.960968
Bahadur Efficiency of Sensitivity Analyses in Observational Studies
  • Jan 2, 2015
  • Journal of the American Statistical Association
  • Paul R Rosenbaum

An observational study draws inferences about treatment effects when treatments are not randomly assigned, as they would be in a randomized experiment. The naive analysis of an observational study assumes that adjustments for measured covariates suffice to remove bias from nonrandom treatment assignment. A sensitivity analysis in an observational study determines the magnitude of bias from nonrandom treatment assignment that would need to be present to alter the qualitative conclusions of the naive analysis, say leading to the acceptance of a null hypothesis rejected in the naive analysis. Observational studies vary greatly in their sensitivity to unmeasured biases, but a poor choice of test statistic can lead to an exaggerated report of sensitivity to bias. The Bahadur efficiency of a sensitivity analysis is introduced, calculated, and connected to established concepts, such as the power of a sensitivity analysis and the design sensitivity. The Bahadur slope equals zero when the sensitivity parameter equals the design sensitivity, but the Bahadur slope permits more refined distinctions. Specifically, the Bahadur relative efficiency can also compare the relative performance of two test statistics at a value of the sensitivity parameter below the minimum of their design sensitivities. Adaptive procedures that combine several tests can achieve the best design sensitivity and the best Bahadur slope of their component tests. Ultimately, in sufficiently large sample sizes, design sensitivity is more important than efficiency for the power of a sensitivity analysis, and the exponential rate at which rate design sensitivity overtakes efficiency is characterized.

  • Research Article
  • Cite Count Icon 29
  • 10.1016/j.jsps.2020.08.011
Influence of nanofiber alignment on the release of a water-soluble drug from cellulose acetate nanofibers
  • Aug 17, 2020
  • Saudi Pharmaceutical Journal : SPJ
  • Prasopchai Patrojanasophon + 4 more

Influence of nanofiber alignment on the release of a water-soluble drug from cellulose acetate nanofibers

  • Research Article
  • 10.1200/jco.2025.43.16_suppl.e20598
Comparative analysis of immunotherapy treatments in non-small cell lung cancer (NSCLC) using novel causal machine learning approaches.
  • Jun 1, 2025
  • Journal of Clinical Oncology
  • Dan Goldstaub + 7 more

e20598 Background: While immune checkpoint inhibitors (ICIs) have revolutionized non-small cell lung cancer (NSCLC) treatment, predicting immune-related adverse events (irAEs) remains challenging. We conducted a retrospective analysis comparing pembrolizumab (Pembro) versus ipilimumab plus nivolumab (Ipi/Nivo) to evaluate treatment effect heterogeneity and identify factors driving differential irAE risk. Methods: We analyzed data from 174 advanced NSCLC patients treated with chemotherapy plus either Pembro (n=71) or Ipi/Nivo (n=103). Using overlap weighting with logistic regression and k-fold cross-validation, we addressed potential bias from non-randomized treatment assignment. We estimated both average treatment effects (ATE) and conditional average treatment effects (CATE) using causal machine learning models. Heterogeneity in treatment effect (HTE) was assessed using the ABC-D (Area Between Curves - Double) test, with subsequent SHAP (SHapley Additive exPlanations) analysis to identify key factors driving heterogeneity. Results: After propensity adjustment reducing average imbalance from 28% to <3%, the efficacy analysis showed a non-significant ATE of 0.11 months [95% CI: -2.88, 3.10] in restricted mean survival time at 24 months favoring Ipi/Nivo. For irAEs, observed rates were 72% with Ipi/Nivo versus 51% with Pembro. The adjusted irAE analysis demonstrated significantly lower risk with Pembro (absolute risk reduction: 0.21 [95% CI: 0.06, 0.35]). The ABC-D test revealed significant heterogeneity in irAE occurrence (p=0.046). SHAP analysis identified smoking history and contralateral lung metastases as factors associated with greater Pembro benefit, while lymph node involvement was associated with reduced Pembro advantage. Conclusions: Our retrospective analysis using advanced causal machine learning techniques demonstrated significant patient-level heterogeneity in irAE risk between Pembro and Ipi/Nivo treatments in advanced NSCLC. While Pembro showed lower overall irAE risk, specific clinical factors modified the magnitude of this benefit. These findings may help inform personalized immunotherapy selection, though validation in prospective studies is warranted.

  • Research Article
  • 10.1186/s12874-025-02551-z
Flexible quantitative bias analysis for unmeasured confounding in subject-level indirect treatment comparisons with proportional hazards violation
  • May 10, 2025
  • BMC Medical Research Methodology
  • Steven Soutar + 4 more

BackgroundIndirect treatment comparisons can provide evidence of relative efficacy for novel therapies when implementation of a randomised controlled trial is infeasible. However, such comparisons are vulnerable to unmeasured confounding bias due to incomplete data collection and non-random treatment assignment. Quantitative bias analysis (QBA) is a framework used to assess the sensitivity of a study’s conclusions to unmeasured confounding. As indirect comparisons between therapies with differing treatment modalities may result in violation of the proportional hazards (PH) assumption, QBA methods that are applicable in this context are required. However, few QBA methods are valid under PH violation.MethodsWe proposed a simulation-based QBA framework which quantifies the sensitivity of the difference in restricted mean survival time (dRMST) to unmeasured confounding, and is therefore valid under violation of the PH assumption. The proposed framework utilises Bayesian data augmentation for the multiple imputation of an unmeasured confounder with user-specified characteristics. Adjustment of dRMST is then implemented in a weighted analysis using the imputed values. The accuracy and precision of our proposed imputation-based adjustment method was assessed through a simulation study. Confounded data was simulated using a common non-PH data generating process, and imputation-based effect estimates were compared against estimates obtained following adjustment for all confounders. Implementation of the proposed QBA framework was also illustrated using a data from an external control arm study demonstrating clear PH violation.ResultsImputation-based adjustment using Bayesian data augmentation was observed to estimate the true adjusted dRMST with minimal bias. Moreover, the bias was comparable to that observed under adjustment when all confounders were measured. Application of the proposed QBA framework to an indirect treatment comparison study enabled identification of the characteristics of an unmeasured confounder that would be required to nullify the study’s conclusions.ConclusionsImputation-based adjustment can accurately recover the true adjusted dRMST in the presence of unmeasured confounding with known exposure and outcome associations. Therefore, the proposed QBA framework can correctly determine the characteristics required by an unmeasured confounder to invalidate a study’s conclusions. Consequently, this framework enables the construction of sensitivity analyses to investigate the robustness of relative efficacy evidence derived from indirect treatment comparisons which exhibit PH violation.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 36
  • 10.1186/s12859-019-2942-y
Block Forests: random forests for blocks of clinical and omics covariate data
  • Jun 27, 2019
  • BMC Bioinformatics
  • Roman Hornung + 1 more

BackgroundIn the last years more and more multi-omics data are becoming available, that is, data featuring measurements of several types of omics data for each patient. Using multi-omics data as covariate data in outcome prediction is both promising and challenging due to the complex structure of such data. Random forest is a prediction method known for its ability to render complex dependency patterns between the outcome and the covariates. Against this background we developed five candidate random forest variants tailored to multi-omics covariate data. These variants modify the split point selection of random forest to incorporate the block structure of multi-omics data and can be applied to any outcome type for which a random forest variant exists, such as categorical, continuous and survival outcomes. Using 20 publicly available multi-omics data sets with survival outcome we compared the prediction performances of the block forest variants with alternatives. We also considered the common special case of having clinical covariates and measurements of a single omics data type available.ResultsWe identify one variant termed “block forest” that outperformed all other approaches in the comparison study. In particular, it performed significantly better than standard random survival forest (adjusted p-value: 0.027). The two best performing variants have in common that the block choice is randomized in the split point selection procedure. In the case of having clinical covariates and a single omics data type available, the improvements of the variants over random survival forest were larger than in the case of the multi-omics data. The degrees of improvements over random survival forest varied strongly across data sets. Moreover, considering all clinical covariates mandatorily improved the performance. This result should however be interpreted with caution, because the level of predictive information contained in clinical covariates depends on the specific application.ConclusionsThe new prediction method block forest for multi-omics data can significantly improve the prediction performance of random forest and outperformed alternatives in the comparison. Block forest is particularly effective for the special case of using clinical covariates in combination with measurements of a single omics data type.

  • Research Article
  • 10.18502/jbe.v6i1.4760
Review of Random Survival Forest method
  • Nov 25, 2020
  • Journal of Biostatistics and Epidemiology
  • Majid Rezaei + 4 more

Background: Over the past years, there has been a great deal of interest in applying statistical machine learning methods to survival analysis. Ensemble-based methods, especially random survival forest, have been developed in various fields, especially medical sciences, due to their high accuracy and non-parametric nature and applicability in high-dimensional data sets. This paper aims to provide a methodological review and how to use random survival forests in the analysis of right-censored survival data. Method: We present a review article based on the latest research in the PubMed database on random survival forest model methodology. Results: This article begins with an introduction to tree-based methods, ensemble algorithms, and random forest (RF) method, followed by random survival forest framework, bootstrapped data and out-of-bag (OOB) ensemble estimators, review of performance evaluation indicators, how to select important variables, and other advanced topics of random survival forests for time-to-event data. Conclusion: When analyzing right-censored survival data with high-dimensional data, while the relationships between variables are complex and their interactions are taken into account, the nonparametric random survival forest (RSF) method determines important variables affecting survival times with high accuracy and speed and also does not need to test the restrictive assumptions.

  • Research Article
  • Cite Count Icon 4
  • 10.1177/0962280218785926
Doubly robust weighted log-rank tests and Renyi-type tests under non-random treatment assignment and dependent censoring.
  • Jul 9, 2018
  • Statistical Methods in Medical Research
  • Chenxi Li

The log-rank test is widely used to test difference in event time distribution between treatment groups. However, if subjects are not randomly assigned to treatment groups, which is often the case in observation studies, the log-rank test is not asymptotically correct for detecting group survival difference due to the imbalance of confounding variables between groups. We develop a class of modified weighted log-rank tests and Renyi-type tests for two-sample survival comparison under non-random treatment assignment. The new tests can also account for non-random censoring that depends on baseline covariates. The proposed methods involve building working models for treatment assignment, cause-specific hazard of dependent censoring, and the time to event. We prove that, when either the models for treatment assignment and dependent censoring or the model for the event time is true, the new tests are asymptotically correct, i.e. being doubly robust. Numerical experiments demonstrate the tests' double-robustness property in finite samples of realistic sizes, and also show that the doubly robust log-rank test is at least as powerful as the regular log-rank test when the treatment assignment is random and there is no dependent censoring. An application to a kidney transplant data set illustrates the utility of the proposed methods.

  • Research Article
  • Cite Count Icon 41
  • 10.1016/j.eururo.2019.05.037
Programmed Death-1 or Programmed Death Ligand-1 Blockade in Patients with Platinum-resistant Metastatic Urothelial Cancer: A Systematic Review and Meta-analysis
  • Jun 11, 2019
  • European urology
  • Scot A Niglio + 14 more

Programmed Death-1 or Programmed Death Ligand-1 Blockade in Patients with Platinum-resistant Metastatic Urothelial Cancer: A Systematic Review and Meta-analysis

  • Research Article
  • 10.1002/sim.70031
Modeling the Restricted Mean Survival Time Using Pseudo‐Value Random Forests
  • Feb 22, 2025
  • Statistics in Medicine
  • Alina Schenk + 2 more

ABSTRACTThe restricted mean survival time (RMST) has become a popular measure to summarize event times in longitudinal studies. Defined as the area under the survival function up to a time horizon τ>0, the RMST can be interpreted as the life expectancy within the time interval [0,τ]. In addition to its straightforward interpretation, the RMST allows for the definition of valid estimands for the causal analysis of treatment contrasts in medical studies. In this work, we introduce a non‐parametric approach to model the RMST conditional on a set of baseline variables (including, e.g., treatment variables and confounders). Our method is based on a direct modeling strategy for the RMST, using leave‐one‐out jackknife pseudo‐values within a random forest regression framework. In this way, it can be employed to obtain precise estimates of both patient‐specific RMST values and confounder‐adjusted treatment contrasts. Since our method (termed “pseudo‐value random forest”, PVRF) is model‐free, RMST estimates are not affected by restrictive assumptions like the proportional hazards assumption. Particularly, PVRF offers a high flexibility in detecting relevant covariate effects from higher‐dimensional data, thereby expanding the range of existing pseudo‐value modeling techniques for RMST estimation. We investigate the properties of our method using simulations and illustrate its use by an application to data from the SUCCESS‐A breast cancer trial. Our numerical experiments demonstrate that PVRF yields accurate estimates of both patient‐specific RMST values and RMST‐based treatment contrasts.

  • Research Article
  • Cite Count Icon 120
  • 10.1097/jto.0b013e318233d835
Random Survival Forests
  • Dec 1, 2011
  • Journal of Thoracic Oncology
  • Jeremy M.G Taylor

Random Survival Forests

  • Discussion
  • Cite Count Icon 1
  • 10.2215/cjn.15021121
Waitlist Mortality for Second Kidney Transplants.
  • Jan 1, 2022
  • Clinical Journal of the American Society of Nephrology
  • Mohammad Kazem Fallahzadeh + 1 more

Waitlist Mortality for Second Kidney Transplants.

  • Research Article
  • Cite Count Icon 6
  • 10.1200/jco.2019.37.15_suppl.9087
Pembrolizumab alone or with chemotherapy for PD-L1 positive NSCLC: A network meta-analysis of randomized trials.
  • May 20, 2019
  • Journal of Clinical Oncology
  • Mark Doherty + 4 more

9087 Background: Pembrolizumab (P) has replaced chemotherapy (C) as first-line treatment for advanced non-small cell lung cancer (NSCLC) with tumor PD-L1 expression > / = 50%. Among PD-L1 unselected patients, P+C is superior to C alone. This network meta-analysis compared P alone with P+C in patients with > / = 50% PD-L1 positive NSCLC. Methods: An indirect network was constructed to compare P and P+C through the control arms of the Keynote 024, 189 and 407 (PD-L1 > / = 50% subgroup) trials. Baseline characteristics and chemotherapy outcomes were examined for heterogeneity. Overall survival (OS), progression-free survival (PFS), objective response rate (ORR) and toxicities including immune-related adverse events (irAE) were extracted from trial results. Toxicity results were unavailable for the PD-L1 > / = 50% subgroups of KN 189 & 407, so overall study results were used. Survival outcomes are expressed as hazard ratios (HRs) or restricted mean survival time (RMST) ratios, and toxicity and ORR as risk difference (RD). Results: 507 patients were included: 154 on P, 430 on C and 483 on P+C. Patient characteristics across trials were similar in age, sex, performance status and smoking history. All trials had similar chemotherapy outcomes (PFS 6, 4.9, 4.8 mos) suggesting similar populations. Network meta-analysis showed no difference between P+C and C alone in OS (HR 0.85, 95%CI 0.45-1.59, p = 0.60) or PFS (HR 0.73, 95%CI 0.48-1.1, p = 0.13), but P+C was associated with higher ORR (+16.9%, 95%CI 0.7-33%, p = 0.04). RMST analysis suggested fewer early PFS events with P+C (0-6 mo RMST ratio 1.25, RMST difference 1.02 mo, p = 0.002), with the difference disappearing at 1 year (0-12 mo RMST ratio 1.16, p = 0.07). No difference in RMST for OS was found. Overall toxicities, hematologic and grade 3-5 toxicities were higher with P+C compared with P alone (table). Conclusions: Among patients with > / = 50% PD-L1 positive NSCLC, P+C did not improve OS or PFS compared with P alone, but was associated with higher ORR. RMST analysis suggested fewer early progression events using P+C. [Table: see text]

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.