Assessing delayed treatment benefits of immunotherapy using long-term average hazard: a novel test/estimation approach
Delayed treatment effects on time-to-event outcomes are commonly observed in randomized controlled trials of cancer immunotherapies. When the treatment effect has a delayed onset, the conventional test/estimation approach—using the log-rank test for between-group comparison and Cox’s hazard ratio to quantify the treatment effect—can be suboptimal. The log-rank test may lack power in such scenarios, and the interpretation of the hazard ratio is often ambiguous. Recently, alternative test/estimation approaches have been proposed to address these limitations. One such approach is based on long-term restricted mean survival time (LT-RMST), while another is based on average hazard with survival weight (AH-SW). This paper integrates these two concepts and introduces a novel long-term average hazard (LT-AH) approach with survival weight for both hypothesis testing and estimation. Numerical studies highlight specific scenarios where the proposed LT-AH method achieves higher power than the existing alternatives. The LT-AH for each group can be estimated nonparametrically, and the proposed between-group comparison maintains test/estimation coherency. Because the difference and ratio of LT-AH do not rely on model assumptions about the relationship between two groups, the LT-AH approach provides a robust framework for estimating the magnitude of between-group differences. Furthermore, LT-AH allows for treatment effect quantification in both absolute (difference in LT-AH) and relative (ratio of LT-AH) terms, aligning with guideline recommendations and addressing practical needs. Given its interpretability and improved power in certain settings, the proposed LT-AH approach offers a useful alternative to conventional hazard-based methods, particularly when delayed treatment effects are expected.Supplementary InformationThe online version contains supplementary material available at 10.1007/s10985-025-09671-0.
- Research Article
4
- 10.1002/sim.9662
- Jan 18, 2023
- Statistics in Medicine
The pattern of the difference between two survival curves we often observe in randomized clinical trials for evaluating immunotherapy is not proportional hazards; the treatment effect typically appears several months after the initiation of the treatment (ie, delayed difference pattern). The commonly used logrank test and hazard ratio estimation approach will be suboptimal concerning testing and estimation for those trials. The long-term restricted mean survival time (LT-RMST) approach is a promising alternative for detecting the treatment effect that potentially appears later in the study. A challenge in employing the LT-RMST approach is that it must specify a lower end of the time window in addition to a truncation time point that the RMST requires. There are several investigations and suggestions regarding the choice of the truncation time point for the RMST. However, little has been investigated to address the choice of the lower end of the time window. In this paper, we propose a flexible LT-RMST-based test/estimation approach that does not require users to specify a lower end of the time window. Numerical studies demonstrated that the potential power loss by adopting this flexibility was minimal, compared to the standard LT-RMST approach using a prespecified lower end of the time window. The proposed method is flexible and can offer higher power than the RMST-based approach when the delayed treatment effect is expected. Also, it provides a robust estimate of the magnitude of the treatment effect and its confidence interval that corresponds to the test result.
- Research Article
7
- 10.1002/pst.2092
- Jan 11, 2021
- Pharmaceutical Statistics
The standard log-rank test has been extended by adopting various weight functions. Cancer vaccine or immunotherapy trials have shown a delayed onset of effect for the experimental therapy. This is manifested as a delayed separation of the survival curves. This work proposes new weighted log-rank tests to account for such delay. The weight function is motivated by the time-varying hazard ratio between the experimental and the control therapies. We implement a numerical evaluation of the Schoenfeld approximation (NESA) for the mean of the test statistic. The NESA enables us to assess the power and to calculate the sample size for detecting such delayed treatment effect and also for a more general specification of the non-proportional hazards in a trial. We further show a connection between our proposed test and the weighted Cox regression. Then the average hazard ratio using the same weight is obtained as an estimand of the treatment effect. Extensive simulation studies are conducted to compare the performance of the proposed tests with the standard log-rank test and to assess their robustness to model mis-specifications. Our tests outperform the Gρ,γ class in general and have performance close to the optimal test. We demonstrate our methods on two cancer immunotherapy trials.
- Abstract
- 10.1136/jitc-2024-sitc2024.1436
- Nov 1, 2024
- Journal for ImmunoTherapy of Cancer
BackgroundIn the past decade, along with successful clinical development for novel immunotherapy in oncology, a delayed onset of treatment effect that is associated with the mechanism of action of immunotherapy...
- Research Article
16
- 10.1093/jnci/djz030
- Mar 5, 2019
- JNCI: Journal of the National Cancer Institute
The treatment effect in survival analysis is commonly quantified as the hazard ratio, and tested statistically using the standard log-rank test. Modern anticancer immunotherapies are successful in a proportion of patients who remain alive even after a long-term follow-up. This new phenomenon induces a nonproportionality of the underlying hazards of death. The properties of the net survival benefit were illustrated using the dataset from a trial evaluating ipilimumab in metastatic melanoma. The net survival benefit was then investigated through simulated datasets under typical scenarios of proportional hazards, delayed treatment effect, and cure rate. The net survival benefit test was computed according to the value of the minimal survival difference considered clinically relevant. As comparators, the standard and the weighted log-rank tests were also performed. In the illustrative dataset, the net survival benefit favored ipilimumab [Δ(0) = 15.8%, 95% confidence interval = 4.6% to 27.3%, P = .006]. This favorable effect was maintained when the analysis was focused on long-term survival differences (eg, >12 months, Δ(12) = 12.5% (95% confidence interval = 4.4% to 20.6%, P = .002). Under the scenarios of a delayed treatment effect and cure rate, the power of the net survival benefit test compared favorably to the standard log-rank test power and was comparable to the power of the weighted log-rank test for large values of the threshold of clinical relevance. The net long-term survival benefit is a measure of treatment effect that is meaningful whether or not hazards are proportional. The associated statistical test is more powerful than the standard log-rank test when a delayed treatment effect is anticipated.
- Research Article
9
- 10.1002/pst.2003
- Feb 24, 2020
- Pharmaceutical Statistics
The indirect mechanism of action of immunotherapy causes a delayed treatment effect, producing delayed separation of survival curves between the treatment groups, and violates the proportional hazards assumption. Therefore using the log-rank test in immunotherapy trial design could result in a severe loss efficiency. Although few statistical methods are available for immunotherapy trial design that incorporates a delayed treatment effect, recently, Ye and Yu proposed the use of a maximin efficiency robust test (MERT) for the trial design. The MERT is a weighted log-rank test that puts less weight on early events and full weight after the delayed period. However, the weight function of the MERT involves an unknown function that has to be estimated from historical data. Here, for simplicity, we propose the use of an approximated maximin test, the V0 test, which is the sum of the log-rank test for the full data set and the log-rank test for the data beyond the lag time point. The V0 test fully uses the trial data and is more efficient than the log-rank test when lag exits with relatively little efficiency loss when no lag exists. The sample size formula for the V0 test is derived. Simulations are conducted to compare the performance of the V0 test to the existing tests. A real trial is used to illustrate cancer immunotherapy trial design with delayed treatment effect.
- Research Article
3
- 10.1002/pst.1982
- Nov 15, 2019
- Pharmaceutical Statistics
A challenge arising in cancer immunotherapy trial design is the presence of a delayed treatment effect wherein the proportional hazard assumption no longer holds true. As a result, a traditional survival trial design based on the standard log-rank test, which ignores the delayed treatment effect, will lead to substantial loss of statistical power. Recently, a piecewise weighted log-rank test is proposed to incorporate the delayed treatment effect into consideration of the trial design. However, because the sample size formula was derived under a sequence of local alternative hypotheses, it results in an underestimated sample size when the hazard ratio is relatively small for a balanced trial design and an inaccurate sample size estimation for an unbalanced design. In this article, we derived a new sample size formula under a fixed alternative hypothesis for the delayed treatment effect model. Simulation results show that the new formula provides accurate sample size estimation for both balanced and unbalanced designs.
- Research Article
- 10.1016/j.cct.2025.107860
- May 1, 2025
- Contemporary clinical trials
Impact of informative censoring on estimation and testing in randomized trials with delayed treatment effects.
- Research Article
12
- 10.1002/sim.8440
- Nov 26, 2019
- Statistics in Medicine
Cancer immunotherapy trials have two special features: a delayed treatment effect and a cure rate. Both features violate the proportional hazard model assumption and ignoring either one of the two features in an immunotherapy trial design will result in substantial loss of statistical power. To properly design immunotherapy trials, we proposed a piecewise proportional hazard cure rate model to incorporate both delayed treatment effect and cure rate into the trial design consideration. A sample size formula is derived for a weighted log-rank test under a fixed alternative hypothesis. The accuracy of sample size calculation using the new formula is assessed and compared with the existing methods via simulation studies. A real immunotherapy trial is used to illustrate the study design along with practical consideration of balance between sample size and follow-up time.
- Research Article
9
- 10.1080/19466315.2016.1207560
- Jul 2, 2016
- Statistics in Biopharmaceutical Research
ABSTRACTIn study designs for randomized clinical trials with a survival endpoint, the log-rank test is commonly used with the treatment effect under a proportional hazards assumption. Recently, treatment effects in cancer immunotherapy trials have exhibited a delayed effect pattern with late separation of survival curves, raising challenges to the use of conventional study design hypotheses and analysis. In particular, when a trial with interim analyses is designed using a group sequential method, the expected treatment effect from a log-rank test statistic varies across analysis times and differs from the parameter specified under the alternative hypothesis. In this article, we present statistical analytical work that formulates a design including interim analyses with a survival endpoint under a delayed treatment effect alternative. Closed-form solutions are provided for calculating power and sample size over varying study/follow-up times for the group sequential, delayed treatment effect design. The analytical work is also presented graphically and simulations are conducted for validation.
- Research Article
- 10.1080/10543406.2023.2296055
- Dec 24, 2023
- Journal of Biopharmaceutical Statistics
Cancer immunotherapy trials are frequently characterized by a delayed treatment effect that violates the proportional hazards assumption. The log-rank test (LRT) suffers a substantial loss of statistical power under the nonproportional hazards model. Various group sequential designs using weighted LRTs (WLRTs) have been proposed under the fixed delayed treatment effect model. However, patients enrolled in immunotherapy trials are often heterogeneous, and the duration of the delayed treatment effect is a random variable. Therefore, we propose group sequential designs under the random delayed effect model using the random delayed distribution WLRT. The proposed group sequential designs are developed for monitoring the efficacy of the trial using the method of Lan-DeMets alpha-spending function with O’Brien-Fleming stopping boundaries or a gamma family alpha-spending function. The maximum sample size for the group sequential design is obtained by multiplying an inflation factor with the sample size for the fixed sample design. Simulations are conducted to study the operating characteristics of the proposed group sequential designs. The robustness of the proposed group sequential designs for misspecifying random delay time distribution and domain is studied via simulations.
- Research Article
5
- 10.1111/rssc.12345
- Mar 25, 2019
- Journal of the Royal Statistical Society Series C: Applied Statistics
SummaryWe consider a comparison of Kaplan–Meier curves from clinical trials in which there may be a delayed treatment effect. Any such delay takes us outside the umbrella of a proportional hazards structure and therefore outside the setting in which the log-rank test would be optimal. The approach of Chauvel and O’Quigley based on Brownian motion approximations enables the construction of powerful tests in situations of non-proportionality and, in particular, a powerful test in the situation of delayed effect. The power of this test is seen to be very close to that of the most powerful test, which, however, is unavailable in practice. We show that the test is unbiased and consistent under general conditions. Under the null, we obtain identical large sample behaviour to the log-rank test so the type 1 error is correctly controlled. Under proportional hazards departures from the null we obtain results that indicate a manageable loss in power compared with the log-rank test. The usual sample size calculations can still provide a useful guide. Support for the theoretical findings are provided by simulations as well as illustrations from three immunotherapy clinical trials.
- Research Article
- 10.1200/jco.2022.40.16_suppl.e16222
- Jun 1, 2022
- Journal of Clinical Oncology
e16222 Background: Recently, immunotherapy has played a crucial role in treating liver cancer, one of the major cancers that contributes to global cancer burden. Overall survival (OS) is widely applied in cancer trials to evaluate the treatment effects of new therapies. However, it requires more patients and longer follow-up time comparing with progression-free survival (PFS). In addition, while assessing the treatment effects of cancer immunotherapy, proportional hazard (PH) assumption is often violated due to issues such as delayed treatment effects. Restricted Mean Survival Time (RMST) ratio is increasingly used for treatment effect evaluation when the PH assumption is violated. Such change prompts an important question whether the surrogacy value of PFS will be affected when RMST ratio is used to characterize treatment effect. The aim of this study is to examine the feasibility of using PFS as a surrogate endpoint for OS when the treatment effect is measured using hazard ratio (HR) versus RMST ratio. Methods: The surrogacy of PFS on OS was evaluated through examining the association between PFS and OS using HR and RMST ratio. Seven immunotherapy studies published between 2000 and 2021 were included (Table). Information of examined studies such as treatment arms information, OS and PFS were collected. RMST ratio for PFS and OS were calculated based on the Kaplan-Meier plots extracted from each article using WebPlotDigitizer 4.4. The weighted least square regression lines and R^2 between OS and PFS for HR and RMST ratio were calculated. Results: Among 7 immunotherapy studies, 4 gave placebos to the control arms as treatments. 2 provided Sorafenib and 1 assigned the same drug as treatment arm but with different schedule. All 7 studies had OS and PFS as the endpoints. 2 studies violated the PH assumptions. Based on the data extracted from examined articles, a moderate correlation (0.52) between PFS and OS was observed for HR while low correlation (0.10) was observed for RMST ratio. Conclusions: The R^2 values differ greatly depending on whether HR or RMST ratio was used for assessing surrogacy. Our finding may have important implications for the design of future immunotherapy liver cancer trials. For future work, increasing the number of included studies for a more comprehensive analysis is needed. Moreover, trial-level surrogacy analysis should be conducted to complement the study-level investigation.[Table: see text]
- Research Article
- 10.1016/j.medp.2024.100006
- Jan 24, 2024
- Medicine Plus
Delayed treatment effect predicting (DTEP) model for guiding immuno-oncology trial designs
- Research Article
6
- 10.1016/j.conctc.2015.08.003
- Oct 1, 2015
- Contemporary Clinical Trials Communications
A novel design for randomized immuno-oncology clinical trials with potentially delayed treatment effects
- Research Article
1
- 10.1080/10543406.2023.2244055
- Aug 11, 2023
- Journal of Biopharmaceutical Statistics
The delayed treatment effect, which manifests as a separation of survival curves after a change point, has often been observed in immunotherapy clinical trials. A late effect of this kind may violate the proportional hazards assumption, resulting in the non-negligible loss of statistical power of an ordinary log-rank test when comparing survival curves. The Fleming-Harrington (FH) test, a weighted log-rank test, is configured to mitigate the loss of power by incorporating a weight function with two parameters, one each for early and late treatment effects. The two parameters need to be appropriately determined, but no helpful guides have been fully established. Since the late effect is expected in immunotherapy trials, we focus on the late effect parameter in this study. We consider parameterizing the late effect in a readily interpretable fashion and determining the optimal late effect parameter in the FH test to maintain statistical power in reference to the asymptotic relative efficiency (ARE). The optimization is carried out under three lag models (i.e. linear, threshold, and generalized linear lag), where the optimal weights are proportional to the lag functions characterized by the change points. Extensive simulation studies showed that the FH test with the selected late parameter reliably provided sufficient power even when the change points in the lag models were misspecified. This finding suggests that the FH test with the ARE-guided late parameter may be a reasonable and practical choice for the primary analysis in immunotherapy clinical trials.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.