Abstract

Genotyping errors are well-known to impact the power and type I error rate in single marker tests of association. Genotyping errors that happen according to the same process in cases and controls are known as non-differential genotyping errors, whereas genotyping errors that occur with different processes in the cases and controls are known as differential genotype errors. For single marker tests, non-differential genotyping errors reduce power, while differential genotyping errors increase the type I error rate. However, little is known about the behavior of the new generation of rare variant tests of association in the presence of genotyping errors. In this manuscript we use a comprehensive simulation study to explore the effects of numerous factors on the type I error rate of rare variant tests of association in the presence of differential genotyping error. We find that increased sample size, decreased minor allele frequency, and an increased number of single nucleotide variants (SNVs) included in the test all increase the type I error rate in the presence of differential genotyping errors. We also find that the greater the relative difference in case-control genotyping error rates the larger the type I error rate. Lastly, as is the case for single marker tests, genotyping errors classifying the common homozygote as the heterozygote inflate the type I error rate significantly more than errors classifying the heterozygote as the common homozygote. In general, our findings are in line with results from single marker tests. To ensure that type I error inflation does not occur when analyzing next-generation sequencing data careful consideration of study design (e.g. use of randomization), caution in meta-analysis and using publicly available controls, and the use of standard quality control metrics is critical.

Highlights

  • In anticipation of a tidal wave of next-generation sequencing data from large case-control studies, numerous statistical tests intended to boost statistical power have been proposed

  • We found that even at very low genotype error rates, misclassifying common homozygotes as heterozygotes translates into substantial power loss for rare variant tests, an effect that is magnified as the minor allele frequency (MAF) at the site decreases

  • Rare Variant Tests Used to Analyze Data This paper examines the effects of differential genotyping error through consideration of five commonly used rare variant tests of association: Combined Multivariate and Collapsing (CMC) [1], Weighted-Sum (WS) [2], Proportion Regression (PR) [5], Cumulative Minor Allele Test (CMAT) [7], and Sequence Kernel Association Test (SKAT) [11]

Read more

Summary

Introduction

In anticipation of a tidal wave of next-generation sequencing data from large case-control studies, numerous statistical tests intended to boost statistical power have been proposed. These tests attempt to aggregate genotype-phenotype association across numerous single nucleotide variant sites in a region of interest [1,2,3,4,5,6,7,8,9,10,11]. Other simulation [17] and mathematical [Liu et al, unpublished manuscript] analyses have attempted to better understand the behavior of these tests

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call