Abstract
Subsampling designs are useful for reducing computational load and storage cost for large-scale data analysis. For massive survival data with right censoring, we propose a class of optimal subsampling designs under the widely-used Cox model. The proposed designs utilize information from both the outcome and the covariates. Different forms of the design can be derived adaptively to meet various targets, such as optimizing the overall estimation accuracy or minimizing the variation of specific linear combination of the estimators. Given the subsampled data, the inverse probability weighting approach is employed to estimate the model parameters. The resultant estimators are shown to be consistent and asymptotically normally distributed. Simulation results indicate that the proposed subsampling design yields more efficient estimators than the uniform subsampling by using subsampled data of comparable sample sizes. Additionally, the subsampling estimation significantly reduces the computational load and storage cost relative to the full data estimation. An analysis of a real data example is provided for illustration.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have