Use of the Multivariate Discriminant Analysis for Genome-Wide Association Studies in Cattle.

Elisabetta Manca,Giustino Gaspa,Alberto Cesarani,Nicolò P.P Macciotta,Corrado Dimauro,Silvia Sorbolini

doi:10.3390/ani10081300

Abstract

Simple SummaryIn the traditional single marker regression approach for genome-wide association studies, if the number of involved individuals is small and the number of single nucleotide polymorphisms (SNPs) to be tested is very large, the probability of getting a significant association simply due to chance becomes enormous. Other techniques, such as the Bayesian methods, require several a priori assumptions, as an a priori posterior inclusion probability threshold, that can limit their effectiveness. In the present study, a multivariate algorithm able to partially overcome this problem was proposed. On simulated data, with 3000 individuals, only 13 and 3 quantitative trait loci (QTLs) were obtained with the single marker regression and the Bayesian approaches, respectively. On the other hand, the multivariate algorithm detected 65 QTLs in the same scenario. The gap between the single marker regression and the multivariate methods slowly decreased as the number of animals increased. This figure was also confirmed on real data.Genome-wide association studies (GWAS) are traditionally carried out by using the single marker regression model that, if a small number of individuals is involved, often lead to very few associations. The Bayesian methods, such as BayesR, have obtained encouraging results when they are applied to the GWAS. However, these approaches, require that an a priori posterior inclusion probability threshold be fixed, thus arbitrarily affecting the obtained associations. To partially overcome these problems, a multivariate statistical algorithm was proposed. The basic idea was that animals with different phenotypic values of a specific trait share different allelic combinations for genes involved in its determinism. Three multivariate techniques were used to highlight the differences between the individuals assembled in high and low phenotype groups: the canonical discriminant analysis, the discriminant analysis and the stepwise discriminant analysis. The multivariate method was tested both on simulated and on real data. The results from the simulation study highlighted that the multivariate GWAS detected a greater number of true associated single nucleotide polymorphisms (SNPs) and Quantitative trait loci (QTLs) than the single marker model and the Bayesian approach. For example, with 3000 animals, the traditional GWAS highlighted only 29 significantly associated markers and 13 QTLs, whereas the multivariate method found 127 associated SNPs and 65 QTLs. The gap between the two approaches slowly decreased as the number of animals increased. The Bayesian method gave worse results than the other two. On average, with the real data, the multivariate GWAS found 108 associated markers for each trait under study and among them, around 63% SNPs were also found in the single marker approach. Among the top 118 associated markers, 76 SNPs harbored putative candidate genes.

Highlights

Genome-wide association studies (GWAS) are mainly aimed at understanding the genetic background of complex traits by relating a large number of single nucleotide polymorphism (SNP)genotypes to observed phenotypes
Results of multivariate GWAS (M-GWAS) and traditional GWAS (T-GWAS) applied to 20 sub-datasets extracted from the simulated population are summarized in Table 1, that reports for each sample the number of associated SNPs, the true associated SNPs and the detected QTLs, respectively
The associated markers were the minimum number of SNPs able to significantly separate low phenotype (LP) from high phenotype (HP) and to correctly assign the animals to the true group of origin

Summary

Introduction

Genome-wide association studies (GWAS) are mainly aimed at understanding the genetic background of complex traits by relating a large number of single nucleotide polymorphism (SNP)genotypes to observed phenotypes. Genome-wide association studies (GWAS) are mainly aimed at understanding the genetic background of complex traits by relating a large number of single nucleotide polymorphism (SNP). When the number of SNPs to be tested is very large, the probability of getting a significant association due to chance becomes enormous. This multiple effect is often controlled by using the Bonferroni’s correction that, requires that all tests are independent of each other [1]. This statement generally does not hold because as the marker density increases, tests become more correlated due to the linkage disequilibrium among adjacent SNPs

Objectives

Methods

Results

Discussion

Conclusion