Background: The two-phase design is an increasingly used approach in health research. In Phase 1, broad data are collected on a large sample size, and in Phase 2, the correlation of auxiliary variables with an expensive variable is used to select a small but more informative sample. Objectives: Previous research on two-phase design has primarily focused on binary, continuous, and time-to-event outcomes. Recognizing the lack of research on ordinal outcomes, we propose a novel approach for three logit models with proportional odds: cumulative logit (CL), adjacent category (AC), and stopping ratio (SR). Additionally, we examine the validity of our method through four simulation scenarios with various outcome distributions. Methods: We have developed a semiparametric maximum likelihood model to incorporate Phase 1 data. An expectation-maximization (EM) algorithm was used to obtain the estimates while the Louis method was employed to calculate the covariance matrix. We compare the results estimated by our method with those obtained using only Phase 2 data under both simple random sampling and balanced outcome-dependent sampling (ODS). Results: In all experiments, balanced ODS led to reduced bias and higher relative efficiency, and the advantage was more noticeable with higher variability in sampling probability between the outcome categories. The EM method led to improved results in balanced ODS for the CL and SR models, but it was not as effective for the AC model. Conclusions: These findings suggest the efficacy of our method in incorporating Phase 1 data to enhance the quality of statistical estimates when used with balanced ODS.
Read full abstract