Abstract

Fears and Brown (1986) developed a procedure for logistic regression analysis of stratified case-control data where the sampling fractions for cases and controls, and thus their population frequencies, were assumed known. They fitted the usual prospective model to the case-control data, treating case-control status as a binary outcome variable. In order to adjust for the biased sampling, they included the logarithm of the odds ratio relating the actual sample sizes and the population frequencies in each stratum as an offset in the regression equation. However, no adjustments were made to the estimated variances of the regression coefficients of variables associated with the strata to account for the information about them available in the population frequencies. Furthermore, Fears and Brown incorrectly claimed that their procedure gave restricted maximum (RML) estimates (Aitchison and Silvey, 1958) based on the of the retrospectively sampled data. Breslow and Cain (1988) show that the Fears and Brown procedure does yield consistent and asymptotically normal estimates of the regression parameters in a logistic regression model for the probability of disease development. In fact, it is equivalent to the conditional maximum likelihood (CML) estimate developed by Manski and McFadden (1981) for estimation of quantal response functions from stratified data. (See also Hsieh, Manski, and McFadden, 1985.) Breslow and Cain extended the work of Manski and McFadden for use in the more realistic situation where the distribution of cases and controls in each stratum is estimated from a sample rather than being assumed known. They developed variance estimators for the regression coefficients that accurately reflect the additional information available in the first-stage sample and that are easily modified to accommodate an infinite population at that stage. We first present a reanalysis of the Fears and Brown data that contrasts the correct variances, computed under the assumptions that the first-stage sample is finite and infinite, respectively, with the incorrect variances obtained from the standard logistic analysis. Then, using a subset of the data with only three strata, we demonstrate the differences between the CML estimates of Manski and McFadden and RML estimates calculated from the retrospective probabilities. A small-scale simulation study investigates the properties of CML and RML estimators in samples of moderate size.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.