Abstract

A large amount of previous literature proposed and studied variable selection procedures for high dimensional data, and most of the researchers focused on the selection properties as well as the point estimation properties. However, there have been limited studies considering the construction of confidence intervals for the highdimensional variable selection problems. In this thesis, we propose two approaches to address this problem for high-dimensional linear and accelerated failure time models. This work is motivated by recent cancer research, in which researchers would like to analyze the clinical and genomic information simultaneously, so as to find out the potential risk factors for different types of cancers. Moreover, they wish to account for the effect of the clinical information while selecting the potential genes and pathways that are associated with the disease. In order to do so, we consider a model with two sets of dependent variables. One set consists of low-dimensional yet practically more interpretable variables, such as clinical and treatment variables. The other set consists of a large number of variables that can be correlated with the response variable in complicated ways, an example being gene expression levels. Two approaches are established to select crucial variables from the highdimensional variable set and estimate the confidence intervals for the parameters in the low-dimensional set. The first approach is called the partially penalized method. This method first selects variables from the high-dimensional set and then fits a traditional regression model using the selected variables along with the variables in the

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call