Abstract

Variable selection in genome-wide association studies can be a daunting task and statistically challenging because there are more variables than subjects. We propose an approach that uses principal-component analysis (PCA) and least absolute shrinkage and selection operator (LASSO) to identify gene-gene interaction in genome-wide association studies. A PCA was used to first reduce the dimension of the single-nucleotide polymorphisms (SNPs) within each gene. The interaction of the gene PCA scores were placed into LASSO to determine whether any gene-gene signals exist. We have extended the PCA-LASSO approach using the bootstrap to estimate the standard errors and confidence intervals of the LASSO coefficient estimates. This method was compared to placing the raw SNP values into the LASSO and the logistic model with individual gene-gene interaction. We demonstrated these methods with the Genetic Analysis Workshop 16 rheumatoid arthritis genome-wide association study data and our results identified a few gene-gene signals. Based on our results, the PCA-LASSO method shows promise in identifying gene-gene interactions, and, at this time we suggest using it with other conventional approaches, such as generalized linear models, to narrow down genetic signals.

Highlights

  • The goal of this paper is to develop and evaluate prediction methods and tools for genome-wide association studies, for variable selection and dimension reduction

  • We have extended the least absolute shrinkage and selection operator (LASSO) method to estimate standard errors and confidence intervals with the bootstrap

  • Enough, whether the principal-component score or the raw single-nucleotide polymorphisms (SNPs) values were placed into the LASSO, the final results were the same

Read more

Summary

Introduction

The goal of this paper is to develop and evaluate prediction methods and tools for genome-wide association studies, for variable selection and dimension reduction. Technical advances have enabled the collection of massive high-dimensional datasets in such studies. This has called for broadening of the area of research in dimension-reduction techniques to provide methods for prediction and variable selection. During the last decade, Li [1], Tibshirani [2], and Efron et al [3] have paved new directions for dimension-reduction techniques and broadened the area to other applications of prediction, including genetics. We explore extensions of currently existing dimension-reduction methods and variable-

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call