Impact of the Scientific Registry of Transplant Recipients’ New Bayesian Method on Estimating Center Effects and Flagging of Centers as Worse Than Expected

W.K Kremers

doi:10.1111/ajt.12813

Abstract

To the Editor: In two recent articles 1, 2, the Scientific Registry of Transplant Recipients (SRTR), describes a new Bayesian methodology for estimating center specific hazard ratios (HRs) comparing graft loss and mortality rates to the national average. Further clarification can be drawn from the SRTRs example reports generated using data from the July 2012 SRTR program-specific reports 3. The SRTR uses a Bayesian method assuming the prior distribution for the center-specific HRs to follow a gamma distribution with α = β = 2. From this model, the new SRTR method estimates HRs using the formula (O + 2)/(E + 2), where O is the observed number of events and E is the expected number. In statistical terms, this method shrinks the HR estimate toward 1 relative to the naïve estimate of O/E, by combining information from the individual center with the average HR from all centers across the country. Still, the SRTR prior has a standard deviation (SD) of 0.71, which is meaningfully and significantly (statistically) larger than that suggested by national data for liver (SD = 0.31 4, with 95% confidence interval (CI) [0.24, 0.41]), and lung transplantation (SD = 0.37) 5. The SRTR suggests that this prior, with SD twice that exhibited in the data, is favorable because the estimates based upon a prior with the true SD would shrink the HRs too much toward 1. Whereas the prior used by the SRTR with larger SD may seem intuitive and could have some favorable operating characteristics, theory states that the Bayesian and hierarchical methods yield better estimates when using the “true” prior, or that with the smaller SD as suggested by the data 6. With the new Bayesian method, centers are flagged as potentially underperforming if either P(HR > 1.2) > 0.75 or P(HR > 2.5) > 0.10 2, 3, 7. Because these probabilities, as well as the (frequentist) p-value ≤ 0.05 part of the historical method, can be expressed as functions of O and E alone, one can also directly compare the Bayesian and historical methods as functions of O and E. The p-value here corresponds to the one-sided statistical test that such a large number of events can be observed if the true HR were actually 1. As shown in Figure 1, for centers with E < 19, the new SRTR method flags centers with the same number of events, or one fewer event than the rule based upon p-value ≤ 0.05. For centers with E ≥ 19, the new SRTR method requires as many or more events than the rule based upon p-value ≤ 0.05. In contrast, for all E, the new SRTR method flags centers with the same number of events, or fewer, and sometimes many fewer, than the historical method, which flags only if p-value ≤ 0.05 and (O/E) > 1.5 and (O − E) > 3. In particular, all centers which would be flagged by the historical method will also be flagged by the Bayesian method. Additionally, some centers which would be only nearly flagged by the historical method will be flagged by the new Bayesian method. An advantage of the gamma prior over the normal distribution for log(HR) 4, 5, or similar methods used by others 8, is the ease of numerical calculations. Still, the SRTR's choice of a gamma distribution for modeling center effects is questionable due to the gamma distribution not being well centered near 1. Yet more problematic may be the choice of a gamma with an SD much larger than that suggested by national data. The SRTR argues that the Bayesian estimates along with a new set of criteria for flagging centers perform better than those by the historical method 2, 7. However, the superiority of their prior, and hence too their HR estimate, is not validated, as pointed out in a recent editorial 9. In their articles, the SRTR chooses a set of “optimal” flagging criteria based upon the gamma prior with SD = 0.71, but does not consider different priors in the optimization of this process. Further, similar flagging results could be obtained by simply adopting new flagging criteria applied to the historical model. Alternate Bayesian or mixed-effects models should be considered which are consistent with and supported by national data and better describe center-specific HRs, and corresponding flagging criteria should be chosen that appropriately identify poor performing centers. W. K. Kremers* Department of Health Sciences Research and The William J. von Liebig Transplant Center, Mayo Clinic, Rochester, MN *Corresponding author: Walter K. Kremers, kremers.walter@mayo.edu The author of this manuscript has no conflicts of interest to disclose as described by the American Journal of Transplantation.

Full Text