Abstract

Using a genetic risk score (GRS) to predict a phenotype in a target sample can be complicated by missing data on the single nucleotide polymorphisms (SNPs) that comprise the GRS. This is usually addressed by imputation, omission of the SNPs or by replacing the missing SNPs with proxy SNPs. To assess the impact of the omission and proxy approaches on effect size estimation and predictive ability of weighted and unweighted GRS with small numbers of SNPs, we simulated a dichotomous phenotype conditional on real genotype data. We considered scenarios in which the proportion of missing SNPs ranged from 20–70%. We assessed the impact of omitting or replacing missing SNPs on the association between the GRS and phenotype, the corresponding statistical power and the area under the receiver operating curve. Omission resulted in a larger bias towards the null value of the effect size, a smaller predictive ability and greater loss of statistical power than proxy approaches. The predictive ability of a weighted GRS that includes SNPs with large weights depends of the availability of these large-weight SNPs.

Highlights

  • The potential to understand the genetic underpinnings of complex diseases has been extended considerably with the advent of the human genome project [1]

  • When 20% of single nucleotide polymorphisms (SNPs) were unavailable, correlations with the gold standard genetic risk score (GRS) were all >0.8 for unweighted GRS; 84% of the correlations for weighted GRS were >0.8 (S1 Table)

  • At any given proportion of missing SNPs and holding the quality of proxy SNPs constant, the median correlation was higher in unweighted compared to weighted GRS and the interquartile range was always wider for weighted GRS

Read more

Summary

Introduction

The potential to understand the genetic underpinnings of complex diseases has been extended considerably with the advent of the human genome project [1]. The effect of one single-nucleotide polymorphism (SNP) on a complex phenotype is typically small, explaining only a small proportion of the variability in the phenotype. Reported that combining “risk alleles” of selected SNPs into an “aggregate risk score” predicted schizophrenia and bipolar disorder [2], there has been growing interest in what is referred to as Genetic Risk Scores (GRS) [3,4,5,6] for a variety of different phenotypes. SNPs are selected based on their nominal P-value for a specific phenotype observed in a genomewide association study (GWAS) as a discovery sample. The association between this GRS and the phenotype is studied

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.