Rejoinder for “Meta‐analysis for Surrogacy: Accelerated Failure Time Models and Semicompeting Risks Modeling”

Debashis Ghosh,Jeremy M. G. Taylor,Daniel J. Sargent

doi:10.1111/j.1541-0420.2011.01638.x

Abstract

We would first like to express our appreciation to co-editor David Zucker and the Associate Editor for organizing this discussion. We also thank the discussants for their comments on our paper. They have raised many excellent points, and in our response, we only deal with a subset of them. Geert Molenberghs (M) and John O’Quigley and Philippe Flandre (OF) accurately describe the methodology in our paper as joint regression and association modelling of the surrogate and true endpoints in which a constraint is placed on the type of data that are used (the “wedge” region). As OF noted, this constraint leads to the multistate model of Fix and Neyman (1951). This data structure complicates the standard estimation procedures that were developed by Burzykowski et al. (2005, Ch. 11). However, much of the model formulation is very similar to what was described there. The constraints in our approach can be viewed as a different model for the error distribution. Our focus is not on predictions, as advocated by Edward Korn (K), partly because it is very hard with censored data to estimate the intercept parameter in a linear model well without making strong assumptions (Ying et al., 1995). K is suspicious of the standard errors in our semi-competing risks analysis, but our application of the methodology to data from Ghosh (2009) yielded essentially identical answers to those reported there (data not shown). An implication of the artificial censoring strategy we propose here is that we are throwing away information on recurrences. Consequently, the standard errors for the treatment effects on the surrogate endpoint will increase in our approach relative to approaches that do not throw away that information (e.g. the analyses in Table 1 of K’s discussion). An implication of the semi-competing risks approach will be that the magnitude of the treatment effect on the surrogate endpoint will be less than or equal to that on the true endpoint because of the wedge contstraint. Table 1 R2 values for recurrence in the colorectal cancer data Vance Berger, Grant Izmirlian and Diana Knoll (BIK) and K criticize us with respect to composite endpoints. There are two issues here. The first is whether or not composite endpoints should be used for assessing treatment effects in clinical trials. BIK and K strongly advocate for composite endpoints such as disease-free survival in oncology trials. Since disease-free survival is arguably a meaningful clinical endpoint, we agree with BIK and K’s point if the goal is simply to understand the treatment effect. However, a second goal is attempting to understand the association between the surrogate endpoint with the true endpoint. As we discussed in the paper, this is problematic if the surrogate endpoint is a composite endpoint that uses information on the true endpoint. In the context of the motivating colorectal cancer example, we are arguing that recurrence and death are separate processes. One can interpret our modelling strategy as a model for the process that gave rise to the data, rather than a model for the observed data. In modeling the biology in this context it is useful to recognize that recurrence is not a spontaneous event. It occurs because the cancer is regrowing and reaches a size where it is detected. From this perspective there is some rationale for considering when the cancer would have grown to such a size to be detected had not the patient died from something else. K says patients might die from their disease without having progression or without having it observed. That is context-dependent and pretty rare in the cancer clinical trials we analyze. OF and K advocate the use of proportional hazards (PH) models in their discussions. Since we were focusing on estimation using the wedge constraint, PH models were not available to us. The recent work of Xu et al. (2010), discussed by OF, allows for proportional hazards models for S and T in the semi-competing risks setting. The type of R2 that OF describe comes from a comparison of models for T|Z and T|S,Z. It is not at all straightforward to calculate this quantity here because of two reasons. The first is that including S as a covariate, in conjuction with the constraint that S < T, will complicate estimation. Second, provided one could develop a valid method for estimation in the model for T|S,Z with S < T, calculating an R2-type measure poses its own issues. Guidance for constructing such measures would come from previous proposals to create likelihood ratio-type statistics from estimating equations (e.g., Li, 1993). OF were interested in the R2 values for our example. We show them in Table 1 for the colorectal cancer data in which parametric Weibull PH models are fit, along with adjustment for stage, age (log-transformed) and treatment. The method of Nagelkerke (1991) for calculating R2 was used. The values range between 0.38 and 0.56; this is in comparison with the R2 of 0.69 that OF obtain in their example. The question remains of how to set guidelines for the R2 value in deciding whether to use the surrogate marker. M makes a push for performing sensitivity analyses in our modelling procedures. We agree this is an important task and an area for future research. He also asks about the potential for causal interpretations of the parameters that we have estimated. Using the structural modelling framework of Pearl (2001), we (Ghosh et al., 2010) have recently shown that the relative effect (i.e. ratio of the two regression coefficients) can be interpreted as a causal parameter in the linear case. There has been recent work on framing the surrogacy problem in the potential outcomes framework (Gilbert and Hudgens 2008, Li et al 2010). Attempting to incorporate the semi-competing risks data structure into the potential outcomes framework is more challenging. Suppose we define the potential outcomes {Si*(1),Si*(0),Ti*(1),Ti*(0)},i=1,…,n, where {Si*(Z),Ti*(Z)} denotes the joint potential outcome for time to the surrogate and true endpoints, respectively, for the ith individual if assigned treatment Z, Z = 0/1. Then causal estimands are defined to be within-individual contrasts in T* and S*. Frangakis and Rubin (2002) defined the concept of principal stratification, in which within-individual contrasts for T* are considered conditional on S*. The problem with the semi-competing risks approach is that S* might not be well-defined if the person experiences the true endpoint but not the surrogate endpoint. This has been referred to by Zhang and Rubin (2003) as “truncation by death.” While the potential outcomes framework might not allow for well-conceptualized causal estimands with semi-competing risks data, this is not the only model for causality that exists in the literature. In particular, econometricians work with so-called structural selection models (Abbing and van der Berg, 2003), and such a modelling framework might allow for better incorporation of semi-competing risks data. Of course, “causal estimand” has a different meaning using these models relative to the potential outcomes framework. This research is currently under investigation. We broadly agree with much of what the discussants proposed regarding trial-level meta-analysis. K advocates prediction, but as noted before, that is not straightforward in our modelling scheme with censoring present. In our example, we combined treatment arms despite well-documented evidence of heterogeneity in the different groups. We note that there are 56 and 48 subjects with time to recurrence and time to death equalling zero, all of which are censored, so this represents a very small percentage of observations. The lively discussion of our paper has led us to consider a compromise between the association framework proposed here with another view of surrogates, termed auxiliary variables, that might lead to greater consensus. If T is missing, auxiliary variable methods would impute the value of T based on the value of S. In this way the composite endpoint of DFS can be viewed as an imputation strategy by replacing missing values of T with S. From the perspective of auxiliary variables this is clearly biased, but in this setting this might be reasonable for two reasons. First, S and T are highly correlated so we might expect S to be a good prediction of T. Second, DFS as an endpoint has a clinically meaningful interpretation. Thinking of surrogate markers as auxiliary information would seem to be a strategy that could keep the discussants such as K and BIK happy because we still use the real endpoint if it is available but would allow for information in S to be utilized. If S was only weakly related to T then there would be little gain in efficiency. By contrast, if S was strongly related to T then there are potential gains in efficiency. In closing, we would like to stress that if the goal is to identify surrogate endpoints that occur before the true endpoint so that trials can be done more quickly, then this will necessitate accepting a greater level of uncertainty. There are aspects based on the semi-competing risks framework that allow for this, but by no means is this the only type of methodology available. The question then becomes how much are you willing to lean on the knowledge from biology and from data from other trials to help control this uncertainty. How to do this is where the role of statistics is crucial.

Full Text