Abstract
The ability to use genome-wide association studies (GWAS) for genetic discovery depends upon our ability to distinguish true causative from false positive association signals. Population structure (PS) has been shown to cause false positive signals in GWAS. PS correction is routinely used for analysis of human GWAS results, and it has been assumed that it also should be utilized for murine GWAS using inbred strains. Nevertheless, there are fundamental differences between murine and human GWAS, and the impact of PS on murine GWAS results has not been carefully investigated. To assess the impact of PS on murine GWAS, we examined 8223 datasets that characterized biomedical responses in panels of inbred mouse strains. Rather than treat PS as a confounding variable, we examined it as a response variable. Surprisingly, we found that PS had a minimal impact on datasets measuring responses in ≤20 strains; and had surprisingly little impact on most datasets characterizing 21 – 40 inbred strains. Moreover, we show that true positive association signals arising from haplotype blocks, SNPs or indels, which were experimentally demonstrated to be causative for trait differences, would be rejected if PS correction were applied to them. Our results indicate because of the special conditions created by GWAS (the use of inbred strains, small sample sizes) PS assessment results should be carefully evaluated in conjunction with other criteria, when murine GWAS results are evaluated.
Highlights
Because of ancestral relatedness among the individuals within an analyzed population, a genome-wide association studies (GWAS) will identify a true causative genetic variant along with multiple other false positive associations, some of which arise because of commonly inherited genetic regions within a sub-population
We first examined the percentage of the variance that was explained when a variable number of principal components (PCs), which ranged from 1 to 33 because < 33 inbred strains were analyzed in any dataset, were used for the principal component analysis (PCA) analysis
While Population structure (PS) correction helps to eliminate false positives in human genetic studies, we found that PS makes a smaller than expected contribution to most murine GWAS studies
Summary
Because of ancestral relatedness among the individuals within an analyzed population, a GWAS will identify a true causative genetic variant along with multiple other false positive associations, some of which arise because of commonly inherited genetic regions within a sub-population. This property, which is referred to as ‘population structure’ (PS) and has been shown to exist in populations ranging from plants (Zhao et al, 2007) to humans (Reich and Goldstein, 2001; Yu et al, 2006), inflates the number of false positive results obtained in a GWAS. PCA has two advantages over using the population structure matrix: (i) the finite number of subpopulations do not have to be specified prior to the analysis, which can be an arbitrary process that introduces errors; and (ii) it is far more computationally efficient, which is important when many individuals with many SNPs are evaluated
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have