Abstract

The two-phase design has recently received attention in the statistical literature as an extension to the traditional case-control study for settings where a predictor of interest is rare or subject to missclassification. Despite a thorough methodological treatment and the potential for substantial efficiency gains, the two-phase design has not been widely adopted. This may be due, in part, to a lack of general-purpose, readily-available software. The osDesign package for R provides a suite of functions for analyzing data from a two-phase and/or case-control design, as well as evaluating operating characteristics, including bias, efficiency and power. The evaluation is simulation-based, permitting flexible application of the package to a broad range of scientific settings. Using lung cancer mortality data from Ohio, the package is illustrated with a detailed case-study in which two statistical goals are considered: (i) the evaluation of small-sample operating characteristics for two-phase and case-control designs and (ii) the planning and design of a future two-phase study.

Highlights

  • Researchers have at their disposal a wide variety of study designs with which to identify and assess associations between predictors and outcomes

  • At least for these examples, power gains under a two-phase design are primarily obtained for the coefficient corresponding to the covariate that is stratified upon

  • The two-phase design is well-established in the statistical literature, researchers have been slow to adopt the design as an efficient alternative to the traditional case-control design

Read more

Summary

Introduction

Researchers have at their disposal a wide variety of study designs with which to identify and assess associations between predictors and outcomes. For the setting where the exposure is binary, White (1982) proposed using the two-phase design as a means to improving efficiency (Neyman 1938) In this context, the design stratifies the population jointly on the outcome and exposure, resulting in four phase I strata. In this article we introduce and provide an overview of a new R (R Development Core Team 2011) package, osDesign, that contains a suite of functions useful when designing and analyzing two-phase and case-control studies.

The two-phase design
Data collection scheme
Two-phase likelihood
Estimation and inference
Design considerations
Software
Evaluating operating characteristics via simulation
Algorithm
Simulation functions
Example
Marginal exposure distribution
Model specification
Design choice
Small-sample operating characteristics
A single two-phase design
Comparing specific designs
All two-phase designs
NA 4 NA 23 NA 24 NA 34 NA
Case-control sampling at phase I
Traditional case-control designs
Power calculations for study design
Expected phase I strata
Power for a two-phase study
Comparing two-phase designs
Power for case-control studies
Modifying anticipated effect sizes
Run times and Monte Carlo error
Findings
Summary
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call