Abstract

The common disease/rare variant hypothesis predicts that rare variants with large effects will have a strong impact on corresponding phenotypes. Therefore it is assumed that rare functional variants are enriched in the extremes of the phenotype distribution. In this analysis of the Genetic Analysis Workshop 17 data set, my aim is to detect genes with rare variants that are associated with quantitative traits using two general approaches: analyzing the association with the complete distribution of values by means of linear regression and using statistical tests based on the tails of the distribution (bottom 10% of values versus top 10%). Three methods are used for this extreme phenotype approach: Fisher’s exact test, weighted-sum method, and beta method. Rare variants were collapsed on the gene level. Linear regression including all values provided the highest power to detect rare variants. Of the three methods used in the extreme phenotype approach, the beta method performed best. Furthermore, the sample size was enriched in this approach by adding additional samples with extreme phenotype values. Doubling the sample size using this approach, which corresponds to only 40% of sample size of the original continuous trait, yielded a comparable or even higher power than linear regression. If samples are selected primarily for sequencing, enriching the analysis by gathering a greater proportion of individuals with extreme values in the phenotype of interest rather than in the general population leads to a higher power to detect rare variants compared to analyzing a population-based sample with equivalent sample size.

Highlights

  • Genome-wide association studies effectively identify new common loci and pathways, but a large amount of estimated heritability is still unexplained [1]

  • A lot of information might be lost, leading to diminished power. Is this power loss merely due to a smaller sample size?. With these considerations in mind, my aim in this analysis of the Genetic Analysis Workshop 17 (GAW17) data set is to detect genes with rare variants that are associated with quantitative traits using two general approaches: analyzing the association with the complete distribution of values by means of linear regression and performing statistical tests based on the tails of the distribution

  • Comparing type I error and power For Q4, which serves as a control phenotype, the type I error was inflated for the linear regression (0.10) and pretty much preserved to slightly elevated for the other methods (0.02–0.07)

Read more

Summary

Introduction

Genome-wide association studies effectively identify new common loci and pathways, but a large amount of estimated heritability is still unexplained [1]. When applying this extreme trait approach to a continuous phenotype, much of the overall distribution of the phenotype is neglected and some rare genetic variants with just moderate effects may be missed. In contrast to this assumption, the extreme phenotype approach has been more successful than linear regression on all values in a recent study on free fatty acids [9]. A linear regression on the complete phenotype distribution was used and methods designed for rare variants in a case-control design [10] by focusing on the extremes of the phenotype. An excess of rare variants in the upper tail of the distribution is shown by the case-control design

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call