Sample-size Formula for Case-cohort Studies

Kiyoshi Kubota,Akira Wakana

doi:10.1097/ede.0b013e3182087650

Abstract

To the Editor: The case-cohort design is an efficient alternative to the full cohort design. When compared with the case-control study nested within the cohort, the case-cohort design has flexibility for a series of exploratory analyses because a single subcohort is employed to analyze multiple outcomes.1,2 This design feature is of particular importance in some specific types of research, including pharmacoepidemiology studies. For example, it can be used to evaluate the association between a single specific drug and multiple adverse events, of which the association with some of the events is often unknown or little understood at the beginning of the study. Nevertheless, the case-cohort design has not often been employed. One of the reasons hindering the wide use of the design may be the scarcity of the information essential for planning individual studies, including sample size calculation. Recently, Cai and Zeng3,4 have presented a method for power/sample size calculation as a natural generalization of the log-rank test in the full cohort design. We show a simple sample size formula for the case-cohort design interpretable as the straightforward expansion of the conventional sample-size formula for the cohort study. Nfull denotes the sample size needed for the cohort study and N1full (N0full) is the size of the exposed (unexposed) population in the full cohort, that is, Nfull = (1 + K)N1full where K = N0full/N1full. When RR is the relative risk, or the ratio of the risk (incidence proportion) in the exposed (P1) to that in the unexposed (P0) (ie, RR = P1/P0) and PD is the common estimate of the incidence proportion under the null hypothesis defined as PD = (N1fullP1 + N0fullP0)/Nfull= P0(RR + K)/(1 + K), based on the conventional sample size formula for the cohort study, where zc is (1 − c) th standard normal quantile, A = (1 + 1/K)PD(1 − PD), B = RR · P0 (1 − RR · P0) + P0(1 − P0)/K and C = P0 (RR − 1). Using m, the ratio of the subcohort to cases in the entire cohort, the entire size of the case-cohort study, N, is simply formulated as Of note, m should be assigned by a researcher who is planning the study. A simulation study using a model subject to time-to-event analysis5 revealed that the proposed sample size yielded a satisfactory empirical power and type I empirical error rate. For a single event, the number of subjects where the detailed information on covariates is collected (ie, subcohort members and/or cases) defined as ndetail is the smallest when m = 1; however, for multiple events, ndetail is the smallest when m is larger than 1. In general, with a larger m, the size of the entire cohort N is closer to Nfull but ndetail is larger. To achieve a good balance between N and ndetail, m = 3-5 may be adopted in many occasions. For example, (N, ndetail) = (19, 972, 70) and (11, 984, 126) for m = 1 and 5, respectively, when (P0, RR, K, α, β) = (0.001, 4, 3, 0.05, 0.2). In actual situations, if the estimation for all or some of covariates is quite costly, the value of ndetail may be minimized by adjusting m within available resources. Details on derivation of the formula and simulation are available in the eAppendix (https://links.lww.com/EDE/A449). Kiyoshi Kubota Akira Wakana Department of Pharmacoepidemiology Faculty of Medicine University of Tokyo Bunkyo-ku, Tokyo [email protected]

Full Text