Abstract

Genome-wide association study (GWAS) has identified thousands of genetic variants associated with complex traits and diseases. Compared with analyzing a single phenotype at a time, the joint analysis of multiple phenotypes can improve statistical power by taking into account the information from phenotypes. However, most established joint algorithms ignore the different level of correlations between multiple phenotypes; instead of that, they simultaneously analyze all phenotypes in a genetic model. Thus, they may fail to capture the genetic structure of phenotypes and consequently reduce the statistical power. In this study, we develop a novel method agglomerative nesting clustering algorithm for phenotypic dimension reduction analysis (AGNEP) to jointly analyze multiple phenotypes for GWAS. First, AGNEP uses an agglomerative nesting clustering algorithm to group correlated phenotypes and then applies principal component analysis (PCA) to generate representative phenotypes for each group. Finally, multivariate analysis is employed to test associations between genetic variants and the representative phenotypes rather than all phenotypes. We perform three simulation experiments with various genetic structures and a real dataset analysis for 19 Arabidopsis phenotypes. Compared to established methods, AGNEP is more powerful in terms of statistical power, computing time, and the number of quantitative trait nucleotides (QTNs). The analysis of the Arabidopsis real dataset further illustrates the efficiency of AGNEP for detecting QTNs, which are confirmed by The Arabidopsis Information Resource gene bank.

Highlights

  • Genome-wide association study (GWAS) is a powerful tool for exploring associations between genetic variants and phenotypes

  • To evaluate the performance of different methods, we focus on 19 quantitative phenotypes: days to flowering under long days (LD), days to flowering under LD with vernalization (LDV), days to flowering under short days (SD), days to flowering under SD with vernalization (SDV), days to flowering at 10, 16, and 22◦C (FT10, FT16, and FT22), days to flowering with 8 weeks vernalization in greenhouse (8WGHFT), leaf number at flowering with 8 weeks vernalization in greenhouse (8WGHLN), days to flowering in field (FTF), diameter of plants at flowering in field (FTD), leaf number at 10, 16, and 22◦C (LN10, LN16, and LN22), plant diameter at 10, 16, and 22◦C (Width10, Width16, and Width22), and presence of leaf serration at 16 and 22◦C (Leafserr16 and Leafserr22)

  • To evaluate the performance of the following multivariate methods (MANOVA, hierarchical clustering method with mean representative phenotypes (HCMM), AGNES for phenotypic dimension reduction analysis (AGNEP), AGNES with mean representative phenotypes (AGNEm), and AGNES with median representative phenotypes (AGNEmed)) and univariate method (ANOVA), we conduct three simulations: independent phenotypic groups in simulation I (Figure 1A), correlated groups in simulation II (Figure 1B), and high-dimensional phenotypes divided into eight groups in simulation III (Figure 1C)

Read more

Summary

Introduction

Genome-wide association study (GWAS) is a powerful tool for exploring associations between genetic variants and phenotypes. Joint analysis of multiple phenotypes can improve the accuracy and efficiency of the test by using more information from multiple phenotypes (Allison et al, 1998; Zhou and Stephens, 2014), which can be very advantageous for two reasons (Allison et al, 1998; Zhou and Stephens, 2014). More and more multivariate analyses have been put forward to analyze the related phenotypes

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call