Abstract

Under cohort sampling designs, additional covariate data are collected on cases of a specific type and a randomly selected subset of noncases, primarily for the purpose of studying associations with a time-to-event response of interest. With such data available, an interest may arise to reuse them for studying associations between the additional covariate data and a secondary non-time-to-event response variable, usually collected for the whole study cohort at the outset of the study. Following earlier literature, we refer to such a situation as secondary analysis. We outline a general conditional likelihood approach for secondary analysis under cohort sampling designs and discuss the specific situations of case-cohort and nested case-control designs. We also review alternative methods based on full likelihood and inverse probability weighting. We compare the alternative methods for secondary analysis in two simulated settings and apply them in a real-data example.

Highlights

  • Cohort sampling designs are two-phase epidemiological study designs where information on time-to-event outcomes of interest over a followup period and some basic covariate data are collected on the whole first-phase study group, referred to as a cohort, and in the second phase, more expensive or difficult-to-obtain additional covariate data are collected only on a subset of the study cohort

  • Examples are the case-cohort 1–3 and nested case-control 4, 5 designs. Such designs are applied for the purpose of studying associations between the time-to-event Journal of Probability and Statistics outcomes and the covariates collected in the second phase

  • Conditional likelihood inference under cohort sampling designs has been studied previously for the analysis of the primary time-to-event outcome by Langholz and Goldstein and Saarela and Kulathinal ; here, we extend these methods to the secondary analysis setting

Read more

Summary

Introduction

Cohort sampling designs are two-phase epidemiological study designs where information on time-to-event outcomes of interest over a followup period and some basic covariate data are collected on the whole first-phase study group, referred to as a cohort, and in the second phase, more expensive or difficult-to-obtain additional covariate data are collected only on a subset of the study cohort. Conditional likelihood inference under cohort sampling designs has been studied previously for the analysis of the primary time-to-event outcome by Langholz and Goldstein and Saarela and Kulathinal ; here, we extend these methods to the secondary analysis setting. Additional covariate data here the lactase persistence genotype Zi are collected only on the second-phase study group O ≡ {i : Ri 1} ⊆ C, specified by the inclusion indicators Ri ∈ {0, 1}, analogously to the survey response/nonresponse setting of Rubin 21. Observed data likelihoods may become sensitive to misspecification of the model for the response variable; the missing data can act to extra parameters, and the actual model parameters may lose their intended interpretation This is a real problem especially in cohort sampling designs with a rare event of interest, since the proportion of uncollected covariate data in the study cohort may be very high.

Methods
Definition
Conditional Likelihood Expression
Special Cases
Risk Set Sampling
Missing Second-Phase Covariate Data
Full Likelihood
Inverse Probability Weighting
Conditional Likelihood
Incident Outcomes and Left Truncation
Simulation Study
Multimodality under Full Likelihood When the Sampling Fraction Is Small
An Example with Real Data
Discussion
On Inverse-Probability-Weighted Pseudolikelihood Estimators
Relationship to Retrospective Likelihood
Mean and Variance of the Conditional Likelihood Score Function
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call