Abstract

We consider the variable selection problem when the response is subject to censoring. A main particularity of this context is that information content of sampled units varies depending on the censoring times. Our approach is based on model selection where all 2k possible models are entertained and we adopt an objective Bayesian perspective where the choice of prior distributions is a delicate issue given the well-known sensitivity of Bayes factors to these prior inputs. We show that borrowing priors from the ‘uncensored’ literature may lead to unsatisfactory results as this default procedure implicitly assumes a uniform contribution of all units independently on their censoring times. In this paper, we develop specific methodology based on a generalization of the g-priors, explicitly addressing the particularities of survival problems arguing that it behaves comparatively better than standard approaches on the basis of arguments specific to variable selection problems (like e.g. predictive matching) in the particular case of the accelerated failure time model with lognormal errors. We apply the methodology to a recent large epidemiological study about breast cancer survival rates in Castellón, a province of Spain.

Highlights

  • Introduction and motivationIn variable selection we have k possible explanatory variables but it is uncertain which of these is relevant to explain the response

  • Our research is rooted in the Bayesian paradigm and more concisely on methods based on the posterior distribution that assigns to each candidate model its probability conditional on the observed data

  • An illustration of the potential misbehavior of such default procedures is presented in Section 3 where, we show how a group of experimental units with very small censoring times may severely modify the result of the variable selection exercise

Read more

Summary

Introduction and motivation

In variable selection we have k possible explanatory variables but it is uncertain which of these is relevant to explain the response. The developed ideas are potentially useful for other type of parametric or semiparametric models usually employed in survival analysis This family of priors, that has been deeply studied in Berger and Pericchi (2001); Bayarri and Garcıa-Donato (2007) has received much attention in the literature and has been extended to problems beyond the original Gaussian model to include various types of error distributions These methods are strongly based on the concept of minimal training sample (see Berger and Pericchi, 2004, for a review of the topic), whose definition is intriguing in problems with observations with different information content (as here) Strategies to circumvent these difficulties have been developed in the series of papers Perra et al (2013); Cabras et al (2014) and Cabras et al (2015), but these approaches are computationally intensive since an integral must be evaluated for every training sample and many integrals are needed for one model comparison.

The statistical model considered
Motivating example
General considerations
The prior covariance matrix
Properties of ΣM and the proposed prior
Computing Bayes factors
The approximated Bayes factor is
Predictive matching
Variable selection
A simulation study over heart transplant data
Findings
Further remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call