Abstract

Genotype imputation has become standard practice in modern genetic studies. As sequencing-based reference panels continue to grow, increasingly more markers are being well or better imputed but at the same time, even more markers with relatively low minor allele frequency are being imputed with low imputation quality. Here, we propose new methods that incorporate imputation uncertainty for downstream association analysis, with improved power and/or computational efficiency. We consider two scenarios: I) when posterior probabilities of all potential genotypes are estimated; and II) when only the one-dimensional summary statistic, imputed dosage, is available. For scenario I, we have developed an expectation-maximization likelihood-ratio test for association based on posterior probabilities. When only imputed dosages are available (scenario II), we first sample the genotype probabilities from its posterior distribution given the dosages, and then apply the EM-LRT on the sampled probabilities. Our simulations show that type I error of the proposed EM-LRT methods under both scenarios are protected. Compared with existing methods, EM-LRT-Prob (for scenario I) offers optimal statistical power across a wide spectrum of MAF and imputation quality. EM-LRT-Dose (for scenario II) achieves a similar level of statistical power as EM-LRT-Prob and, outperforms the standard Dosage method, especially for markers with relatively low MAF or imputation quality. Applications to two real data sets, the Cebu Longitudinal Health and Nutrition Survey study and the Women’s Health Initiative Study, provide further support to the validity and efficiency of our proposed methods.

Highlights

  • Genotype imputation has become standard practice in modern genetic studies [1] [2][3] [4]

  • Imputation dosage based methods provide an attractive compromise between modeling complexity, computational efficiency and statistical power, have been shown analytically to be optimal among methods based on one-dimensional summary statistics [11], and have been most commonly adopted in recent imputation-aided genome-wide association studies (GWAS) and meta-analyses [14][15][16][17]

  • Existing methods have focused on common variants, which have been the focus of the past wave of GWAS using HapMap-based imputation

Read more

Summary

Introduction

Genotype imputation has become standard practice in modern genetic studies [1] [2][3] [4]. For each untyped variant imputed, standard imputation methods estimate posterior probabilities of all possible genotypes. When the untyped variant is biallelic with alleles A and B, we obtain posterior probabilities for A/ A, A/B, and B/B with the constraint of summation being one. Such probability information can be further summarized into degenerate one-dimensional summary statistics including the mode (the best guess genotype, or the genotype with the highest posterior probability), or the mean (the imputed dosage). Explicitly modeling the probabilities of all possible genotypes using the mixture of regression models (abbreviated Mixture hereafter and detailed below) has the best performance in terms of statistical efficiency, with low imputation quality, but at the cost of increased computational complexity [13]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.