Abstract Genome-wide association studies using SNPs aim to capture QTL effects across the genome. However, the results of GWAS vary across studies, even when traits and populations are similar. Fluctuation in estimating SNP effects and variances can be caused by several biological factors such as selection, genetic architecture of trait, and population stratification, and by nonbiological factors, such as method and inflation of the predictor error for SNP effects. The current study aims to compare changes in SNP effects and GEBVs using different training populations. A simulation study was conducted using data from 20,000 individuals generated in the AlphaSimR package in R. Data for a trait with 10 chromosomes were simulated, and linkage disequilibrium was created by random crossing across 1,000 generations. The trait genetic architecture was controlled by 100 loci on each chromosome, accounting for 100% of the genetic variation. Pedigree, phenotypes, and genotypes for all individuals in the population were created, but phenotypes for individuals in the last generation were masked for validation purposes. The training population was then divided into three types comprising all, only odd, and only even animals. Genomic breeding values (GEBV) were estimated using the genomic information by single-step GBLUP. SNP effects and variance were calculated by back-solving the GEBVs. The correlation between all and odd populations was 0.83, all and even populations was 0.81, and odd and even populations were 0.40. Correlations for all 50k SNP effects between all and odd population was 0.75, the all and even population was 0.75, and the odd and even populations was 0.20. Changing the training population data by half led to small changes in accuracy. However, SNP effects fluctuated across populations more than GEBVs. Visual inspection of Manhattan plots showed that the peak regions varied with the training population used. The top regions were more stable and experienced fewer fluctuations, while the intermediate regions varied more often. The largest differences were between the odd and even scenarios. The regions explaining more variance were not the same, and the variations were greater when less data were used for training (odd and even). Overall, the fluctuations in SNP effects in GWAS highlight the complexity of genetic influences on traits, and how sensitive those studies are to small changes in the data. Understanding the factors that contribute to GWAS fluctuations may help improve the design and interpretation of such studies. We expect to derive a novel tool that uses the fluctuations of SNP effects to leverage QTL discovery.
Read full abstract