Abstract

Background As the scale and power of GWAS have increased to detect the small genetic effect sizes involved in complex polygenic traits, the results of GWAS, especially SNP effect size estimates, are increasingly utilised for prediction. A popular approach is the application of Polygenic Risk Scores (PRS). However, the effect size estimates are inherently prone to overfit the specific samples on which the GWAS were performed (e.g. “the Winner's Curse”). The inflation of effect size estimation reduces the out-of-sample accuracy of prediction based on GWAS results. Many shrinkage methods have been developed to correct for such inflation. As GWAS results and large individual genotype data sets become widely available, there is greater opportunity to accurately evaluate effect size inflation and compare the performance of different shrinkage methods. Methods We develop a novel permutation-based shrinkage method and compare its performance with previously developed methods based on local false discovery rate and the LASSO. We evaluate the performance of the methods across a wide range of phenotypes and investigate different factors (e.g. phenotype heritability) that influence their impact on the predictive power of polygenic risk scores. Results Using large-scale real genotype data, we show that SNP effect sizes are markedly overfit even in relatively large samples, but that our permutation-based shrinkage approach can improve PRS prediction dramatically. Discussion Our results suggest that GWAS results can be adjusted using an efficient empirical approach to provide more accurate effect size estimates and thus greater downstream predictive power. This approach could be applied to a wide variety of big data settings.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call