Abstract

Forward regression, a classical variable screening method, has been widely used for model building when the number of covariates is relatively low. However, forward regression is seldom used in high-dimensional settings because of the cumbersome computation and unknown theoretical properties. Some recent works have shown that forward regression, coupled with an extended Bayesian information criterion (EBIC)-based stopping rule, can consistently identify all relevant predictors in high-dimensional linear regression settings. However, the results are based on the sum of residual squares from linear models and it is unclear whether forward regression can be applied to more general regression settings, such as Cox proportional hazards models. We introduce a forward variable selection procedure for Cox models. It selects important variables sequentially according to the increment of partial likelihood, with an EBIC stopping rule. To our knowledge, this is the first study that investigates the partial likelihood-based forward regression in high-dimensional survival settings and establishes selection consistency results. We show that, if the dimension of the true model is finite, forward regression can discover all relevant predictors within a finite number of steps and their order of entry is determined by the size of the increment in partial likelihood. As partial likelihood is not a regular density-based likelihood, we develop some new theoretical results on partial likelihood and use these results to establish the desired sure screening properties. The practical utility of the proposed method is examined via extensive simulations and analysis of a subset of the Boston Lung Cancer Survival Cohort study, a hospital-based study for identifying biomarkers related to lung cancer patients’ survival.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call