Abstract

Genotype errors are well known to increase type I errors and/or decrease power in related tests of genotype-phenotype association, depending on whether the genotype error mechanism is associated with the phenotype. These relationships hold for both single and multimarker tests of genotype-phenotype association. To assess the potential for genotype errors in Genetic Analysis Workshop 18 (GAW18) data, where no gold standard genotype calls are available, we explored concordance rates between sequencing, imputation, and microarray genotype calls. Our analysis shows that missing data rates for sequenced individuals are high and that there is a modest amount of called genotype discordance between the 2 platforms, with discordance most common for lower minor allele frequency (MAF) single-nucleotide polymorphisms (SNPs). Some evidence for discordance rates that were different between phenotypes was observed, and we identified a number of cases where different technologies identified different bases at the variant site. Type I errors and power loss is possible as a result of missing genotypes and errors in called genotypes in downstream analysis of GAW18 data.

Highlights

  • Over the past decade, a large body of literature has been amassed related to genotype errors for single-nucleotide polymorphisms (SNPs) microarrays

  • Across the 230,597,304 (240,456 SNPs × 959 individuals) possible genotype calls, there are more than 500,00 discordant genotypes, and more than 5 million genotypes that are missing on at least 1 of the 2 platforms

  • Despite sophisticated data-cleaning pipelines for all 3 technologies, a noticeable number of discordant genotypes remain in the Genetic Analysis Workshop 18 (GAW18) data

Read more

Summary

Introduction

A large body of literature has been amassed related to genotype errors for SNP microarrays. We have a clear understanding of the prevalence of such errors and of many potential sources of the errors, as well as an understanding of the downstream implications of genotype errors on the type I error rate and power of related single SNP tests of genotype-phenotype association [1]. Nondifferential genotyping errors, that is, errors that are the result of a random process unrelated to the phenotype, decrease power [2,3,4]. Recent papers demonstrate similar results (i.e., decreased power and increased type I error for nondifferential and differential genotyping errors) are true for multimarker tests as well. Large error rates have been observed for sequence data [14-16], much larger than were typical in the early days of SNP microarrays [17]. There is the potential for substantial power loss and inflated type I error for multimarker tests involving NGS data

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.