Abstract
We consider the variable selection problem when the response is subject to censoring. A main particularity of this context is that information content of sampled units varies depending on the censoring times. Our approach is based on model selection where all 2k possible models are entertained and we adopt an objective Bayesian perspective where the choice of prior distributions is a delicate issue given the well-known sensitivity of Bayes factors to these prior inputs. We show that borrowing priors from the ‘uncensored’ literature may lead to unsatisfactory results as this default procedure implicitly assumes a uniform contribution of all units independently on their censoring times. In this paper, we develop specific methodology based on a generalization of the g-priors, explicitly addressing the particularities of survival problems arguing that it behaves comparatively better than standard approaches on the basis of arguments specific to variable selection problems (like e.g. predictive matching) in the particular case of the accelerated failure time model with lognormal errors. We apply the methodology to a recent large epidemiological study about breast cancer survival rates in Castellón, a province of Spain.
Highlights
Introduction and motivationIn variable selection we have k possible explanatory variables but it is uncertain which of these is relevant to explain the response
Our research is rooted in the Bayesian paradigm and more concisely on methods based on the posterior distribution that assigns to each candidate model its probability conditional on the observed data
An illustration of the potential misbehavior of such default procedures is presented in Section 3 where, we show how a group of experimental units with very small censoring times may severely modify the result of the variable selection exercise
Summary
In variable selection we have k possible explanatory variables but it is uncertain which of these is relevant to explain the response. The developed ideas are potentially useful for other type of parametric or semiparametric models usually employed in survival analysis This family of priors, that has been deeply studied in Berger and Pericchi (2001); Bayarri and Garcıa-Donato (2007) has received much attention in the literature and has been extended to problems beyond the original Gaussian model to include various types of error distributions These methods are strongly based on the concept of minimal training sample (see Berger and Pericchi, 2004, for a review of the topic), whose definition is intriguing in problems with observations with different information content (as here) Strategies to circumvent these difficulties have been developed in the series of papers Perra et al (2013); Cabras et al (2014) and Cabras et al (2015), but these approaches are computationally intensive since an integral must be evaluated for every training sample and many integrals are needed for one model comparison.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have