Reuse of Controls in Nested Case-Control Studies

Nathalie C Støer,Haakon E Meyer,Sven Ove Samuelsen

doi:10.1097/ede.0000000000000057

Abstract

To the Editor: A nested case-control design with risk set sampling1,2 matches controls to cases on time and often on additional factors. This matching has been thought to make the controls unusable for other endpoints. However, recent methods3–5 enable the reuse of controls, thereby improving efficiency. We demonstrate this in an example assuming proportional hazards models for time-to-event and estimating hazard ratios (HRs). We consider inverse probability–weighted Cox regression models weighted by 1.0 divided by the probability of ever being sampled as control. Subjects can be sampled at each event time they are at risk and meet the matching criteria, thus typically at several occasions. We estimate these probabilities using two methods. For the Kaplan-Meier method,3,4 note that the probability of ever being sampled is 1.0 minus the probability of never being sampled, and the latter probability is the product of probabilities of not being sampled at each possible event time. This leads to the formula which is similar to the Kaplan-Meier estimator for the sampling probabilities . The and are numbers of possible and sampled controls for the case at time , respectively. The is an indicator function that is either 0 or 1. For the second method,5,6 referred to as generalized linear model weights, we consider indicators of ever being sampled as controls among all noncases. The sampling probabilities are estimated using logistic regression with entry time , censoring time , and matching variables as covariates, See the eAppendix (https://links.lww.com/EDE/A762) for more details. We applied inverse probability weighting in a study of serum 25-hydroxyvitamin D (s-25(OH)D) and prostate cancer7 to evaluate this method. The cohort consisted of participants in health surveys in Norway, comprising 116,493 men. Among those, 2,118 were diagnosed with prostate cancer during follow-up. For each incident case, one control was sampled from the case’s risk set, matched on age at serum sampling ±6 months, date of serum sampling ±2 months, and county of residence. Meyer et al7 focused on the association between s-25(OH)D and incidence of prostate cancer. Due to the increased practice of screening for this cancer, death from prostate cancer might be a better endpoint when considering the most serious cases. Among the incident cases, 367 men died from prostate cancer. Traditional analysis of nested case-control data can use only the controls for incident cases who also died from prostate cancer, whereas all sampled controls can be used with inverse probability–weighted analysis. Robust variance estimation, possibly slightly conservative,6 was chosen for the present analyses. The Table displays results from traditional analyses and inverse probability–weighted analyses with Kaplan-Meier and generalized linear model weights. For both endpoints, the hazard rates from inverse probability weighting and traditional analyses were approximately equal. The inverse probability–weighted standard errors for the incidence endpoint were somewhat smaller than the standard error from the traditional analysis. In contrast, the inverse probability–weighted standard errors for the death endpoint were considerably smaller. Because all available controls could be used, the efficiency increases.TABLE: Comparison of the Traditional Estimation Method and Inverse Probability WeightingWe also analyzed a physical activity variable available in the complete cohort (n = 116,493) to contrast cohort and nested case-control analyses. The endpoint was incidence of prostate cancer. With cohort data, the HR was 1.07 (95% confidence interval = 0.95–1.22) compared with 1.09 (0.92–1.29) using traditional nested case-control analysis and 1.07 (0.90–1.26) and 1.01 (0.85–1.21) with generalized linear models and Kaplan-Meier weights, respectively, when comparing moderate activity to sedentary. Hence, the traditional estimates are not necessarily closer to cohort estimates than inverse probability–weighted estimates (see eAppendix [http://links.lww.com/EDE/A762] for full analysis). Our experience suggests that Kaplan-Meier and generalized linear model weights have similar performance with comparable estimated HRs and variances. However, with extremely close matching, simulations indicates that biased estimates can occur when applying Kaplan-Meier weights.8 We have demonstrated that inverse probability weighting can be a powerful alternative with sub-endpoints. Moreover, reuse of controls can be helpful in many multiple outcomes settings. The eAppendix (https://links.lww.com/EDE/A762) gives an example of inverse probability weighting for specific metastasis groups. ACKNOWLEDGMENTS We thank Tone Bjørge and the Janus serum bank for making this study possible. Nathalie C. Støer Department of Mathematics Faculty of Mathematics and Natural Sciences University of Oslo Oslo, Norway [email protected] Haakon E. Meyer Department for Chronic Diseases Division of Epidemiology Norwegian Institute of Public Health Oslo, Norway Sven OveSamuelsen Department of Mathematics Faculty of Mathematics and Natural Sciences University of Oslo Oslo, Norway

Full Text