Abstract
Recently there has been great interest in identifying rare variants associated with common diseases. We apply several collapsing-based and kernel-based single-gene association tests to Genetic Analysis Workshop 17 (GAW17) rare variant association data with unrelated individuals without knowledge of the simulation model. We also implement modified versions of these methods using additional information, such as minor allele frequency (MAF) and functional annotation. For each of four given traits provided in GAW17, we use the Bayesian mixed-effects model to estimate the phenotypic variance explained by the given environmental and genotypic data and to infer an individual-specific genetic effect to use directly in single-gene association tests. After obtaining information on the GAW17 simulation model, we compare the performance of all methods and examine the top genes identified by those methods. We find that collapsing-based methods with weights based on MAFs are sensitive to the “lower MAF, larger effect size” assumption, whereas kernel-based methods are more robust when this assumption is violated. In addition, many false-positive genes identified by multiple methods often contain variants with exactly the same genotype distribution as the causal variants used in the simulation model. When the sample size is much smaller than the number of rare variants, it is more likely that causal and noncausal variants will share the same or similar genotype distribution. This likely contributes to the low power and large number of false-positive results of all methods in detecting causal variants associated with disease in the GAW17 data set.
Highlights
To date, genome-wide association studies (GWAS) have been successful in unveiling many common singlenucleotide polymorphisms (SNPs) associated with common diseases, including type 1 and type 2 diabetes, rheumatoid arthritis, Crohn’s disease, and coronary heart disease [1,2,3]
One possible explanation of this limitation is that GWAS have focused mainly on variants that are common, whereas many disease-causing variants
We found that Q1 is affected by both given genotype data and environmental variables; in contrast, Q2 is mainly affected by genetic but not environmental variables, and Q4 is not affected by any given genotypic data (Table 1). This procedure was performed without knowing the Genetic Analysis Workshop 17 (GAW17) simulation answers, the observed pattern agrees well with the answers
Summary
Genome-wide association studies (GWAS) have been successful in unveiling many common singlenucleotide polymorphisms (SNPs) associated with common diseases, including type 1 and type 2 diabetes, rheumatoid arthritis, Crohn’s disease, and coronary heart disease [1,2,3]. One possible explanation of this limitation is that GWAS have focused mainly on variants that are common (minor allele frequency [MAF] > 5%), whereas many disease-causing variants. The advent of next-generation sequencing technology has offered great opportunities for discovering novel rare variants in the human genome, associating these rare variants with diseases, and increasing our biological knowledge of disease etiology. Whole-exome sequencing technology has emerged as a powerful paradigm for the identification of rare variants associated with diseases. This technology was used in the pilot study of the 1000 Genomes Project [5], from which the Genetic Analysis Workshop 17 (GAW17) mini-exome data were generated
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.