Statistical inference methods for two crossing survival curves: a comparison of methods.

Huimin Li,Yawen Hou,Huilin Chen,Zheng Chen,Dong Han,Zhongxue Chen

doi:10.1371/journal.pone.0116774

Huimin Li, Yawen Hou + Show 4 more

Open Access

https://doi.org/10.1371/journal.pone.0116774

Copy DOI

Journal: PloS one	Publication Date: Jan 23, 2015
Citations: 130	License type: CC BY 4.0

Affiliation: Southern Medical University

Abstract

A common problem that is encountered in medical applications is the overall homogeneity of survival distributions when two survival curves cross each other. A survey demonstrated that under this condition, which was an obvious violation of the assumption of proportional hazard rates, the log-rank test was still used in 70% of studies. Several statistical methods have been proposed to solve this problem. However, in many applications, it is difficult to specify the types of survival differences and choose an appropriate method prior to analysis. Thus, we conducted an extensive series of Monte Carlo simulations to investigate the power and type I error rate of these procedures under various patterns of crossing survival curves with different censoring rates and distribution parameters. Our objective was to evaluate the strengths and weaknesses of tests in different situations and for various censoring rates and to recommend an appropriate test that will not fail for a wide range of applications. Simulation studies demonstrated that adaptive Neyman’s smooth tests and the two-stage procedure offer higher power and greater stability than other methods when the survival distributions cross at early, middle or late times. Even for proportional hazards, both methods maintain acceptable power compared with the log-rank test. In terms of the type I error rate, Renyi and Cramér—von Mises tests are relatively conservative, whereas the statistics of the Lin-Xu test exhibit apparent inflation as the censoring rate increases. Other tests produce results close to the nominal 0.05 level. In conclusion, adaptive Neyman’s smooth tests and the two-stage procedure are found to be the most stable and feasible approaches for a variety of situations and censoring rates. Therefore, they are applicable to a wider spectrum of alternatives compared with other tests.

Highlights

In clinical studies, the task of comparing the overall equality of two survival distributions with censored observations is a key element in survival analysis
N: sample size of two groups; CENR: censoring rate; LR: Log-rank; G01: Fleming-Harrington test ρ = 0, γ = 1; G10: FlemingHarrington test ρ = 1, γ = 0; G11: Fleming-Harrington test ρ = 1, γ = 1; Gehan—Wilcoxon [28] (GW): Gehan-Wilcoxon; Tarone-Ware [29] (TW): Tarone-Ware; SHL1: jZ1+Z2j/2; SHL2:/2; SHL3: max(jZ1j, jZ2j); modified Kolmogorov—Smirnov statistic (MKS): Modified Kolmogorov-Smirnov test; CVM1: Cramér-von Mises test based on Brownian motion; CVM2: Cramér-von Mises test based on Brownian bridge; Weighted Kaplan—Meier test (WKM): Weighted KaplanMeier test; maximum WKM test (MKM): Maximum of the WKM tests; Lin and Wang [4] (LW): Lin and Wang’s test; LX1: Lin and Xu’s one-sided test; LX2: Lin and Xu’s two-sided test; two-stage procedure (TS): Two-stage test; NY1: Neyman’s test d = 4, not data-driven; NY2: Neyman’s test with d = 8, d0 = 0, data-driven and nested. doi:10.1371/journal.pone.0116774.t001
We considered a number of tests for the comparison of two survival distributions and investigated their performances for various sample sizes and censoring rates

Summary

Introduction

The task of comparing the overall equality of two survival distributions with censored observations is a key element in survival analysis. It is well known that the commonly used log-rank test has optimum power under the assumption of proportional hazard rates This assumption is often violated, especially when two survival curves cross each other. Simulation studies have demonstrated that the tests developed by Lin and Wang [4] and by Lin and Xu [1] perform better than do the log-rank and Wilcoxon tests in the case of crossing survival curves. Liu et al [3] have conducted a comprehensive simulation study of three different patterns of crossing hazard rates and have demonstrated that some of weighted log-rank tests (log-rank, Gehan—Wilcoxon and Peto—Peto) lose power compared with methods that are designed to address the problem of crossing hazard rates.

Methods

Existing methods

Findings

Discussion