Abstract

The majority of genes have a genetic component to their expression. Elastic nets have been shown effective at predicting tissue-specific, individual-level gene expression from genotype data. We apply principal component analysis (PCA), linkage disequilibrium pruning, or the combination of the two to reduce, or generate, a lower-dimensional representation of the genetic variants used as inputs to the elastic net models for the prediction of gene expression. Our results show that, in general, elastic nets attain their best performance when all genetic variants are included as inputs; however, a relatively low number of principal components can effectively summarize the majority of genetic variation while reducing the overall computation time. Specifically, 100 principal components reduce the computational time of the models by over 80% with only an 8% loss in R2. Finally, linkage disequilibrium pruning does not effectively reduce the genetic variants for predicting gene expression. As predictive models are commonly made for over 27,000 genes for more than 50 tissues, PCA may provide an effective method for reducing the computational burden of gene expression analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call