Abstract

SUMMARY It is well known that it is valid to treat case-control data as if they had been obtained by random sampling when estimating the odds-ratio parameters of logistic regression models. The result has been generalised to estimation of the regression parameters in multiplicative intercept models and it has further been pointed out that the approach may be applied to non-multiplicative intercept models if the parameter space is first enlarged to include a multiplicative intercept term. This note is concerned with the efficiency of the approach. In epidemiological case-control studies, it is common to model the influence of predictors on the risk of disease through a finite dimensional regression parameter, while treating the marginal distribution of the predictors as an infinite dimensional nuisance parameter. See, for example, Breslow & Day (1980). It is well known that case-control data may be treated as if they had been obtained by random sampling when estimating the regression parameters in a logistic regression model. In a multiplicative intercept model the log-odds-ratio of the risk of disease is modelled as a known function of the regression parameters and the predictors plus a variation-independent intercept parameter. It is also known that the result in Breslow & Day (1980) holds more generally for estimating other than the intercept parameter in any multiplicative intercept model. See, for example, Prentice & Pyke (1979), Hsieh, Manski & McFadden (1985), Anderson (1972), Wacholder & Weinberg (1993), Robins, Rotnitzky & Zhao (1994) or Scott & Wild (1989). With non-multiplicative intercept models, the data may be treated as if they had been obtained by random sampling, if the model for the risk of disease is first replaced in the likelihood by a larger model that has the multiplicative intercept property. The larger model is defined implicitly by treating the logit of the risk of disease as if it were equal to the logit of the risk of disease in the original model, plus an unspecified intercept term. See, for example, Scott & Wild (1986), Robins & Blevins (1987), Breslow & Storer (1985) or Storer, Wacholder & Breslow (1983). The results of Cosslett (1981) imply that joint maximisation of the likelihood for the marginal distribution of the predictors and the regression parameters results in efficient estimates of the regression parameters in the asymptotic setting where the numbers of cases and controls tend to infinity together. Scott & Wild (1986) note that, with randomly sampled data in multiplicative intercept models, the case-control indicators are ancillary for the regression coefficients. Thus joint maximisation of the semiparametric likelihood for the case control data, and maximisation of the likelihood obtained by treating the data as if they were randomly sampled both result in the same estimate of the regression parameters. J. Robins (personal communication) has pointed out that the argument may be applied to show efficiency of the approach in non-multiplicative intercept models as well because the case-control likelihood in the larger model and the case-control likelihood in the smaller model are equivalent. This note presents a direct calculation that demonstrates the efficiency result both for multiplicative and non-multiplicative intercept models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call