Abstract

Extreme phenotype sampling (EPS) is a popular study design used to reduce genotyping or sequencing costs. Assuming continuous phenotype data are available on a large cohort, EPS involves genotyping or sequencing only those individuals with extreme phenotypic values. Although this design has been shown to have high power to detect genetic effects even at smaller sample sizes, little attention has been paid to the effects of confounding variables, and in particular population stratification. Using extensive simulations, we demonstrate that the false positive rate under the EPS design is greatly inflated relative to a random sample of equal size or a “case-control”-like design where the cases are from one phenotypic extreme and the controls randomly sampled. The inflated false positive rate is observed even with allele frequency and phenotype mean differences taken from European population data. We show that the effects of confounding are not reduced by increasing the sample size. We also show that including the top principal components in a logistic regression model is sufficient for controlling the type 1 error rate using data simulated with a population genetics model and using 1,000 Genomes genotype data. Our results suggest that when an EPS study is conducted, it is crucial to adjust for all confounding variables. For genetic association studies this requires genotyping a sufficient number of markers to allow for ancestry estimation. Unfortunately, this could increase the costs of a study if sequencing or genotyping was only planned for candidate genes or pathways; the available genetic data would not be suitable for ancestry correction as many of the variants could have a true association with the trait.

Highlights

  • Extreme phenotype sampling (EPS)— called selective genotyping, trait or outcome dependent sampling—is a popular study design for increasing the power of genetic association studies

  • We have shown that the increased power of the EPS design comes at a cost of a greatly inflated false positive rate due to confounding by population stratification

  • We showed that the other designs have inflated false positive rates, the EPS design was the most severely inflated

Read more

Summary

Introduction

Extreme phenotype sampling (EPS)— called selective genotyping, trait or outcome dependent sampling—is a popular study design for increasing the power of genetic association studies. Assuming a large cohort with continuous phenotype data is available, EPS involves only genotyping individuals in the top and bottom extremes of the phenotype distribution. The rationale for this design is that the phenotypic extremes are enriched for either deleterious or protective variants (Kryukov et al, 2009) and so the power to detect genetic effects can be maintained even while genotyping a smaller subset of a larger cohort (Lander and Botstein, 1989; Van Gestel et al, 2000; Kryukov et al, 2009; Guey et al, 2011; Barnett et al, 2013). The EPS design has been applied to whole-exome sequencing studies in order to find cystic fibrosis modifier genes (Emond et al, 2012), variants associated with pulmonary disease (Bruse et al, 2016), and with diabetic retinopathy (Shtir et al, 2016)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call