Robust Modestly Weighted Log-Rank Tests.
The introduction of checkpoint inhibitors in immuno-oncology has raised questions about the suitability of the log-rank test as the default primary analysis method in confirmatory studies, particularly when survival curves exhibit non-proportional hazards. The log-rank test, while effective in controlling false positive rates, may lose power in scenarios where survival curves remain similar for extended periods before diverging. To address this, various weighted versions of the log-rank test have been proposed, including the "MaxCombo" test, which combines multiple weighted log-rank statistics to enhance power across a range of alternative hypotheses. Despite its potential, the MaxCombo test has seen limited adoption, possibly owing to its proneness to produce counterintuitive results insituations where the hazard functions on the two arms cross. In response, the modestly weighted log-rank test was developed to provide a balanced approach, giving greater weight to later event times while avoiding undue influence from early detrimental effects. However, this test also faces limitations, particularly if the possibility of early separation of survival curves cannot be ruled out a priori. We propose a novel test statistic that integrates the strengths of the standard log-rank test, the modestly weighted log-rank test, and the MaxCombo test. By considering the maximum of the standard log-rank statistic and a modestly weighted log-rank statistic, the new test aims to maintain power under delayed effect scenarios while minimizing power loss relative to the log-rank test in worst-case scenarios. Simulation studies and a case study demonstrate the efficiency and robustness of this approach, highlighting its potential as a robust alternative for primary analysis in immuno-oncology trials.
- Research Article
34
- 10.1001/jamaoncol.2022.2666
- Jul 21, 2022
- JAMA Oncology
The log-rank test is considered the criterion standard for comparing 2 survival curves in pivotal registrational trials. However, with novel immunotherapies that often violate the proportional hazards assumptions over time, log-rank can lose power and may fail to detect treatment benefit. The MaxCombo test, a combination of weighted log-rank tests, retains power under different types of nonproportional hazards. The difference in restricted mean survival time (dRMST) test is frequently proposed as an alternative to the log-rank under nonproportional hazard scenarios. To compare the log-rank with the MaxCombo and dRMST in immuno-oncology trials to evaluate their performance in practice. Comprehensive literature review using Google Scholar, PubMed, and other sources for randomized clinical trials published in peer-reviewed journals or presented at major clinical conferences before December 2019 assessing efficacy of anti-programmed cell death protein-1 or anti-programmed death/ligand 1 monoclonal antibodies. Pivotal studies with overall survival or progression-free survival as the primary or key secondary end point with a planned statistical comparison in the protocol. Sixty-three studies on anti-programmed cell death protein-1 or anti-programmed death/ligand 1 monoclonal antibodies used as monotherapy or in combination with other agents in 35 902 patients across multiple solid tumor types were identified. Statistical comparisons (n = 150) were made between the 3 tests using the analysis populations as defined in the original protocol of each trial. Nominal significance based on a 2-sided .05-level test was used to evaluate concordance. Case studies featuring different types of nonproportional hazards were used to discuss more robust ways of characterizing treatment benefit instead of sole reliance on hazard ratios. In this systematic review and meta-analysis of 63 studies including 35 902 patients, between the log-rank and MaxCombo, 135 of 150 comparisons (90%) were concordant; MaxCombo achieved nominal significance in 15 of 15 discordant cases, while log-rank did not. Several cases appeared to have clinically meaningful benefits that would not have been detected using log-rank. Between the log-rank and dRMST tests, 137 of 150 comparisons (91%) were concordant; log-rank was nominally significant in 5 of 13 cases, while dRMST was significant in 8 of 13. Among all 3 tests, 127 comparisons (85%) were concordant. The findings of this review show that MaxCombo may provide a pragmatic alternative to log-rank when departure from proportional hazards is anticipated. Both tests resulted in the same statistical decision in most comparisons. Discordant studies had modest to meaningful improvements in treatment effect. The dRMST test provided no added sensitivity for detecting treatment differences over log-rank.
- Research Article
19
- 10.1016/j.athoracsur.2011.12.094
- Apr 25, 2012
- The Annals of Thoracic Surgery
Review of Case-Mix Corrected Survival Curves
- Abstract
39
- 10.1186/1745-6215-12-s1-a137
- Dec 1, 2011
- Trials
Background: It is not uncommon for clinical trials to present results on survival time as Kaplan-Meier survival curves that cross, indicating non-proportional hazards. A recent example was given in a pivotal trial in advanced non-small cell lung cancer (The ‘IPASS study’ [1]). Trials such as these present a hazard ratio and log-rank test for treatment comparison as this is their planned primary analysis. However, the validity of such analysis is questionable and has received published criticism. This paper reviews the use of the log-rank test with crossing curves and considers alternatives that have been proposed. Methods: The review of the alternative approaches includes weighted log-rank tests (Wilcoxon, Tarone-Ware, Peto-Prentice and Fleming-Harrington), supremum versions of the log-rank test (modified Kolmogorov-Smirnov and Renyi-type tests) which are based on the maximum difference between estimates of two survivor functions and modified log-rank tests (Lin and Wang test using squared differences at each time point, and Levene-type test focusing on variance differences). In addition, methods based on splitting the analysis at the crossing point have also been proposed. Methods are compared and evaluated using both real and simulated datasets using Weibull and Weibull-Cox distributions representing realistic situations. Results: Crossing survival curves is generally a result of the survival times having greater variance in one treatment group than another. The performance of the log-rank test and alternatives depend on the type of crossing (early, mid or late) but in general the probability of a Type II error is increased for log-rank and weighted log-rank tests but performance is improved with the alternatives. The choice of time-point for the split-analysis is problematic. Standard software such as sts test (Stata), proc lifetest (SAS) and survfit (R) and routines-on-demand support some but not all the tests considered. Conclusions: There is a need in the clinical community to clarify methods that are appropriate when survival curves cross. Statistical analysis plans for clinical trials with survival as primary outcome measure should specify an analysis dependent on the proportionality of hazard rates and explicitly consider non-proportionality issues, powering the analyses based on log-rank alternatives. Modelling the survival data may be more appropriate than simple univariate hypothesis tests when hazards are not proportional. Finally, there are some feasibility issues regarding software for such analysis that remain to be tackled.
- Front Matter
9
- 10.1016/j.pmrj.2016.04.003
- Jun 1, 2016
- PM&R
Introduction to Survival Analysis
- Research Article
24
- 10.1002/pst.376
- Mar 20, 2009
- Pharmaceutical Statistics
The assessment of overall homogeneity of time-to-event curves is a key element in survival analysis in biomedical research. The currently commonly used testing methods, e.g. log-rank test, Wilcoxon test, and Kolmogorov-Smirnov test, may have a significant loss of statistical testing power under certain circumstances. In this paper we propose a new testing method that is robust for the comparison of the overall homogeneity of survival curves based on the absolute difference of the area under the survival curves using normal approximation by Greenwood's formula. Monte Carlo simulations are conducted to investigate the performance of the new testing method compared against the log-rank, Wilcoxon, and Kolmogorov-Smirnov tests under a variety of circumstances. The proposed new method has robust performance with greater power to detect the overall differences than the log-rank, Wilcoxon, and Kolmogorov-Smirnov tests in many scenarios in the simulations. Furthermore, the applicability of the new testing approach is illustrated in a real data example from a kidney dialysis trial.
- Research Article
38
- 10.1186/s12874-022-01520-0
- Jan 30, 2022
- BMC Medical Research Methodology
BackgroundThe exchange of knowledge between statisticians developing new methodology and clinicians, reviewers or authors applying them is fundamental. This is specifically true for clinical trials with time-to-event endpoints. Thereby, one of the most commonly arising questions is that of equal survival distributions in two-armed trial. The log-rank test is still the gold-standard to infer this question. However, in case of non-proportional hazards, its power can become poor and multiple extensions have been developed to overcome this issue. We aim to facilitate the choice of a test for the detection of survival differences in the case of crossing hazards.MethodsWe restricted the review to the most recent two-armed clinical oncology trials with crossing survival curves. Each data set was reconstructed using a state-of-the-art reconstruction algorithm. To ensure reproduction quality, only publications with published number at risk at multiple time points, sufficient printing quality and a non-informative censoring pattern were included. This article depicts the p-values of the log-rank and Peto-Peto test as references and compares them with nine different tests developed for detection of survival differences in the presence of non-proportional or crossing hazards.ResultsWe reviewed 1400 recent phase III clinical oncology trials and selected fifteen studies that met our eligibility criteria for data reconstruction. After including further three individual patient data sets, for nine out of eighteen studies significant differences in survival were found using the investigated tests. An important point that reviewers should pay attention to is that 28% of the studies with published survival curves did not report the number at risk. This makes reconstruction and plausibility checks almost impossible.ConclusionsThe evaluation shows that inference methods constructed to detect differences in survival in presence of non-proportional hazards are beneficial and help to provide guidance in choosing a sensible alternative to the standard log-rank test.
- Research Article
359
- 10.1097/00005792-200203000-00005
- Mar 1, 2002
- Medicine
Predicting Mortality in Systemic Sclerosis
- Research Article
2
- 10.1002/bimj.202100403
- Feb 15, 2023
- Biometrical Journal
For sample size calculation in clinical trials with survival endpoints, the logrank test, which is the optimal method under the proportional hazard (PH) assumption, is predominantly used. In reality, the PH assumption may not hold. For example, in immuno-oncology trials, delayed treatment effects are often expected. The sample size without considering the potential violation of the PH assumption may lead to an underpowered study. In recent years, combination tests such as the maximum weighted logrank test have received great attention because of their robust performance in various hazards scenarios. In this paper, we propose a flexible simulation-free procedure to calculate the sample size using combination tests. The procedure extends the Lakatos' Markov model and allows for complex situations encountered in a clinical trial, like staggered entry, dropouts, etc. We evaluate the procedure using two maximum weighted logrank tests, one projection-type test, and three other commonly used tests under various hazards scenarios. The simulation studies show that the proposed method can achieve the target power for all compared tests in most scenarios. The combination tests exhibit robust performance under correct specification and misspecification scenarios and are highly recommended when the hazard-changing patterns are unknown beforehand. Finally, we demonstrate our method using two clinical trial examples and provide suggestions about the sample size calculations under nonproportionalhazards.
- Research Article
69
- 10.1186/s12874-016-0110-x
- Feb 11, 2016
- BMC Medical Research Methodology
BackgroundMost randomized controlled trials with a time-to-event outcome are designed assuming proportional hazards (PH) of the treatment effect. The sample size calculation is based on a logrank test. However, non-proportional hazards are increasingly common. At analysis, the estimated hazards ratio with a confidence interval is usually presented. The estimate is often obtained from a Cox PH model with treatment as a covariate. If non-proportional hazards are present, the logrank and equivalent Cox tests may lose power. To safeguard power, we previously suggested a ‘joint test’ combining the Cox test with a test of non-proportional hazards. Unfortunately, a larger sample size is needed to preserve power under PH. Here, we describe a novel test that unites the Cox test with a permutation test based on restricted mean survival time.MethodsWe propose a combined hypothesis test based on a permutation test of the difference in restricted mean survival time across time. The test involves the minimum of the Cox and permutation test P-values. We approximate its null distribution and correct it for correlation between the two P-values. Using extensive simulations, we assess the type 1 error and power of the combined test under several scenarios and compare with other tests. We investigate powering a trial using the combined test.ResultsThe type 1 error of the combined test is close to nominal. Power under proportional hazards is slightly lower than for the Cox test. Enhanced power is available when the treatment difference shows an ‘early effect’, an initial separation of survival curves which diminishes over time. The power is reduced under a ‘late effect’, when little or no difference in survival curves is seen for an initial period and then a late separation occurs. We propose a method of powering a trial using the combined test. The ‘insurance premium’ offered by the combined test to safeguard power under non-PH represents about a single-digit percentage increase in sample size.ConclusionsThe combined test increases trial power under an early treatment effect and protects power under other scenarios. Use of restricted mean survival time facilitates testing and displaying a generalized treatment effect.
- Research Article
2
- 10.3390/ijerph20247164
- Dec 11, 2023
- International journal of environmental research and public health
Often in the planning phase of a clinical trial, a researcher will need to choose between a standard versus weighted log-rank test (LRT) for investigating right-censored survival data. While a standard LRT is optimal for analyzing evenly distributed but distinct survival events (proportional hazards), an appropriately weighted LRT test may be better suited for handling non-proportional, delayed treatment effects. The "a priori" misspecification of this alternative may result in a substantial loss of power when determining the effectiveness of an experimental drug. In this paper, the standard unweighted and inverse log-rank tests (iLRTs) are compared with the multiple weight, default Max-Combo procedure for analyzing differential late survival outcomes. Unlike combination LRTs that depend on the arbitrary selection of weights, the iLRT by definition is a single weight test and does not require implicit multiplicity correction. Empirically, both weighted methods have reasonable flexibility for assessing continuous survival curve differences from the onset of a study. However, the iLRT may be preferable for accommodating delayed separating survival curves, especially when one arm finishes first. Using standard large-sample methods, the power and sample size for the iLRT are easily estimated without resorting to complex and timely simulations.
- Abstract
- 10.1136/jitc-2022-sitc2022.0542
- Nov 1, 2022
- Journal for ImmunoTherapy of Cancer
<h3>Background</h3> Intratumoral B-cells are associated with improved survival with ICB in sarcomas.<sup>1</sup> We investigated the dynamics of intratumoral and peripheral BCR repertoires and their association with survival in dedifferentiated liposarcoma...
- Research Article
2
- 10.1111/j.1939-1676.2009.0294.x
- May 1, 2009
- Journal of Veterinary Internal Medicine
Correspondence
- Research Article
3
- 10.3760/cma.j.issn.0376-2491.2011.30.009
- Aug 16, 2011
- National Medical Journal of China
To compare the distant disease-free survival between breast cancer patients with nodal pathological complete response (pCR) and those with nodal residual disease (RD) after neoadjuvant chemotherapy. The clinical and pathological data of 376 needle biopsy proved node positive breast cancer patients undergoing neoadjuvant chemotherapy were retrospectively analyzed. The median follow-up time was 24 months (range: 5 - 100). The pCR rate of axillary lymph node was 30.9%. And the three-year distant disease-free survival (DDFS) rates were 91.7% and 78.8% in the patients with axillary lymph node pCR and RD respectively. According to the Log-rank test, there were significant differences in survival curves (P = 0.016). Multivariate analysis showed that the relative risk of DDFS for patients with RD was 2.14 folds of than that of the pCR group (P = 0.047). No significant difference existed between the disease-free survival (DFS) curve in two groups. DDFS had significant differences between the patients with the number of lymph node metastasis ≤ 3 and ≥ 4 in the RD group (P = 0.001). The distant disease-free survival of node positive breast cancer is associated with the status of axillary lymph node after neoadjuvant chemotherapy.
- Abstract
- 10.1093/ofid/ofab466.750
- Dec 4, 2021
- Open Forum Infectious Diseases
BackgroundGrowing evidence supports the use of remdesivir and tocilizumab for the treatment of hospitalized patients with severe COVID-19. The purpose of this study was to evaluate the use of remdesivir and tocilizumab for the treatment of severe COVID-19 in a community hospital setting.MethodsWe used a de-identified dataset of hospitalized adults with severe COVID-19 according to the National Institutes of Health definition (SpO2 < 94% on room air, a PaO2/FiO2 < 300 mm Hg, respiratory frequency > 30/min, or lung infiltrates > 50%) admitted to our community hospital located in Evanston Illinois, between March 1, 2020, and March 1, 2021. We performed a Cox proportional hazards regression model to examine the relationship between the use of remdesivir and tocilizumab and inpatient mortality. To minimize confounders, we adjusted for age, qSOFA score, noninvasive positive-pressure ventilation, invasive mechanical ventilation, and steroids, forcing these variables into the model. We implemented a sensitivity analysis calculating the E-value (with the lower confidence limit) for the obtained point estimates to assess the potential effect of unmeasured confounding.Figure 1. Kaplan–Meier survival curves for in-hospital death among patients treated with and without steroidsThe hazard ratio was derived from a bivariable Cox regression model. The survival curves were compared with a log-rank test, where a two-sided P value of less than 0.05 was considered statistically significant.Figure 2. Kaplan–Meier survival curves for in-hospital death among patients treated with and without remdesivirThe hazard ratio was derived from a bivariable Cox regression model. The survival curves were compared with a log-rank test, where a two-sided P value of less than 0.05 was considered statistically significant.ResultsA total of 549 patients were included. The median age was 69 years (interquartile range, 59 – 80 years), 333 (59.6%) were male, 231 were White (41.3%), and 235 (42%) were admitted from long-term care facilities. 394 (70.5%) received steroids, 192 (34.3%) received remdesivir, and 49 (8.8%) received tocilizumab. By the cutoff date for data analysis, 389 (69.6%) patients survived, and 170 (30.4%) had died. The bivariable Cox regression models showed decreased hazard of in-hospital death associated with the administration of steroids (Figure 1), remdesivir (Figure 2), and tocilizumab (Figure 3). This association persisted in the multivariable Cox regression controlling for other predictors (Figure 4). The E value for the multivariable Cox regression point estimates and the lower confidence intervals are shown in Table 1.Figure 3. Kaplan–Meier survival curves for in-hospital death among patients treated with and without tocilizumabThe hazard ratio was derived from a bivariable Cox regression model. The survival curves were compared with a log-rank test, where a two-sided P value of less than 0.05 was considered statistically significant.Figure 4. Forest plot on effect estimates and confidence intervals for treatmentsThe hazard ratios were derived from a multivariable Cox regression model adjusting for age as a continuous variable, qSOFA score, noninvasive positive-pressure ventilation, and invasive mechanical ventilation.Table 1. Sensitivity analysis of unmeasured confounding using E-valuesCI, confidence interval. Point estimate from multivariable Cox regression model. The E value is defined as the minimum strength of association on the risk ratio scale that an unmeasured confounder would need to have with both the exposure and the outcome, conditional on the measured covariates, to explain away a specific exposure-outcome association fully: i.e., a confounder not included in the multivariable Cox regression model associated with remdesivir or tocilizumab use and in-hospital death in patients with severe COVID-19 by a hazard ratio of 1.64-fold or 1.54-fold each, respectively, could explain away the lower confidence limit, but weaker confounding could not.ConclusionFor patients with severe COVID-19 admitted to our community hospital, the use of steroids, remdesivir, and tocilizumab were significantly associated with a slower progression to in-hospital death while controlling for other predictors included in the models.DisclosuresAll Authors: No reported disclosures
- Research Article
- 10.1016/j.joms.2011.06.108
- Sep 1, 2011
- Journal of Oral and Maxillofacial Surgery
Poster 08: Salvage Surgery for Recurrent Squamous Cell Carcinoma of the Tongue
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.