Abstract
When testing association between rare variants and diseases, an efficient analytical approach involves considering a set of variants in a genomic region as the unit of analysis. One factor complicating this approach is that the vast majority of rare variants in practical applications are believed to represent background neutral variation. As a result, analyzing a single set with all variants may not represent a powerful approach. Here, we propose two alternative strategies. In the first, we analyze the subsets of rare variants exhaustively. In the second, we categorize variants selectively into two subsets: one in which variants are overrepresented in cases, and the other in which variants are overrepresented in controls. When the proportion of neutral variants is moderate to large we show, by simulations, that the both proposed strategies improve the statistical power over methods analyzing a single set with total variants. When applied to a real sequencing association study, the proposed methods consistently produce smaller p-values than their competitors. When applied to another real sequencing dataset to study the difference of rare allele distributions between ethnic populations, the proposed methods detect the overrepresentation of variants between the CHB (Chinese Han in Beijing) and YRI (Yoruba people of Ibadan) populations with small p-values. Additional analyses suggest that there is no difference between the CHB and CHD (Chinese Han in Denver) datasets, as expected. Finally, when applied to the CHB and JPT (Japanese people in Tokyo) populations, existing methods fail to detect any difference, while it is detected by the proposed methods in several regions.
Highlights
Genome-wide association studies (GWAS) have become popular tools for identifying genetic susceptibility variants for complex diseases. The success of this approach relies on the common disease-common variant (CDCV) hypothesis, which presumes that phenotypic variation of common diseases is explained by several common variants, each with a relatively small effect [1,2,3]
Extensive studies have provided an alternative to the CDCV hypothesis, termed the common diseaserare variant (CDRV) hypothesis, that may be important to the etiology of complex diseases [5,6,7]
It seems that either the CDCV or CDRV hypotheses hold under certain conditions, and that the etiology of complex disease reflects a mixture of both hypotheses along with effects from other factors, e.g., gene by gene interactions and environments
Summary
Genome-wide association studies (GWAS) have become popular tools for identifying genetic susceptibility variants for complex diseases The success of this approach relies on the common disease-common variant (CDCV) hypothesis, which presumes that phenotypic variation of common diseases is explained by several common variants, each with a relatively small effect [1,2,3]. Though individual mutation has a low frequency, their gene-wise or pathway-wise aggregate frequency could be substantially large, which makes it possible for rare variants to be the cause of common diseases It seems that either the CDCV or CDRV hypotheses hold under certain conditions, and that the etiology of complex disease reflects a mixture of both hypotheses along with effects from other factors, e.g., gene by gene interactions and environments
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.