Abstract
Variable selection in genome-wide association studies can be a daunting task and statistically challenging because there are more variables than subjects. We propose an approach that uses principal-component analysis (PCA) and least absolute shrinkage and selection operator (LASSO) to identify gene-gene interaction in genome-wide association studies. A PCA was used to first reduce the dimension of the single-nucleotide polymorphisms (SNPs) within each gene. The interaction of the gene PCA scores were placed into LASSO to determine whether any gene-gene signals exist. We have extended the PCA-LASSO approach using the bootstrap to estimate the standard errors and confidence intervals of the LASSO coefficient estimates. This method was compared to placing the raw SNP values into the LASSO and the logistic model with individual gene-gene interaction. We demonstrated these methods with the Genetic Analysis Workshop 16 rheumatoid arthritis genome-wide association study data and our results identified a few gene-gene signals. Based on our results, the PCA-LASSO method shows promise in identifying gene-gene interactions, and, at this time we suggest using it with other conventional approaches, such as generalized linear models, to narrow down genetic signals.
Highlights
The goal of this paper is to develop and evaluate prediction methods and tools for genome-wide association studies, for variable selection and dimension reduction
We have extended the least absolute shrinkage and selection operator (LASSO) method to estimate standard errors and confidence intervals with the bootstrap
Enough, whether the principal-component score or the raw single-nucleotide polymorphisms (SNPs) values were placed into the LASSO, the final results were the same
Summary
The goal of this paper is to develop and evaluate prediction methods and tools for genome-wide association studies, for variable selection and dimension reduction. Technical advances have enabled the collection of massive high-dimensional datasets in such studies. This has called for broadening of the area of research in dimension-reduction techniques to provide methods for prediction and variable selection. During the last decade, Li [1], Tibshirani [2], and Efron et al [3] have paved new directions for dimension-reduction techniques and broadened the area to other applications of prediction, including genetics. We explore extensions of currently existing dimension-reduction methods and variable-
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have