Abstract
In this article, we consider a class of regularized regression under the additive hazards model with censored survival data and propose a novel approach to achieve simultaneous group selection, variable selection, and parameter estimation for high-dimensional censored data, by combining the composite penalty and the pseudoscore. We develop a local coordinate descent (LCD) algorithm for efficient computation and subsequently establish the theoretical properties for the proposed selection methods. As a result, the selectors possess both group selection oracle property and variable selection oracle property, and thus enable us to simultaneously identify important groups and important variables within selected groups with high probability. Simulation studies demonstrate that the proposed method and LCD algorithm perform well. A real data example is provided for illustration.
Highlights
Variable selection becomes especially challenging when the dataset exhibits group structure
While most survival data involves censorship casting additional complexity to data structure and difficulty in regression modeling, there has been a large class of literature proposing various approaches to address variable selection at the individual and group level based on the Cox models
The asymptotic properties of the proposed estimators include both group selection oracle property and variable selection oracle property, which means important groups and important variables within selected groups are consistently identified, and the resulting estimators are asymptotically normal under some regularity conditions
Summary
Variable selection becomes especially challenging when the dataset exhibits group structure. Under the framework of additive hazards model for high-dimensional data, we propose a novel approach that caputures group structure while retaining sparsity of covariates, such that it simultaneously selects important variables at the individual and group levels, at the same time providing parameter estimates. This is achieved by combining a composite penalty and the pseudoscore method, where the number of covariates p is allowed to grow nonpolynomially with a sample size n.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have