Abstract

BackgroundThere are significant limitations in existing methods for the genome-wide identification of genes whose expression patterns affect traits.ResultsThe transcriptomes of five tissues from 27 genetically diverse maize inbred lines were deeply sequenced to identify genes exhibiting high and low levels of expression variation across tissues or genotypes. Transcription factors are enriched among genes with the most variation in expression across tissues, as well as among genes with higher-than-median levels of variation in expression across genotypes. In contrast, transcription factors are depleted among genes whose expression is either highly stable or highly variable across genotypes. We developed a Bayesian-based method for genome-wide association studies (GWAS) in which RNA-seq-based measures of transcript accumulation are used as explanatory variables (eRD-GWAS). The ability of eRD-GWAS to identify true associations between gene expression variation and phenotypic diversity is supported by analyses of RNA co-expression networks, protein–protein interaction networks, and gene regulatory networks. Genes associated with 13 traits were identified using eRD-GWAS on a panel of 369 maize inbred lines. Predicted functions of many of the resulting trait-associated genes are consistent with the analyzed traits. Importantly, transcription factors are significantly enriched among trait-associated genes identified with eRD-GWAS.ConclusionseRD-GWAS is a powerful tool for associating genes with traits and is complementary to SNP-based GWAS. Our eRD-GWAS results are consistent with the hypothesis that genetic variation in transcription factor expression contributes substantially to phenotypic diversity.

Highlights

  • There are significant limitations in existing methods for the genome-wide identification of genes whose expression patterns affect traits

  • RNA-seq was conducted on mRNA extracted from multiple maize organs collected from the 27 inbred founders of the nested association mapping (NAM) population

  • We selected negative binomial distributions to model the distributions of read counts for genes, and the scaled log of over-dispersion parameters of quasi-negative binomial generalized linear models to minimize the correlation between expression variation and expression levels (“Methods”)

Read more

Summary

Introduction

There are significant limitations in existing methods for the genome-wide identification of genes whose expression patterns affect traits. Associations are sought between genetic variants (e.g., SNPs) and variation in trait values via genome-wide association studies (GWAS). Typical approaches to GWAS exploit linkage disequilibrium (LD) between genetic variants such as SNPs and loci that directly affect traits of interest. MLM solutions have been developed to overcome the confounding effects of population structure and the relatedness among individuals, and provide increased computational efficiency and statistical power [2,3,4,5]. Typical MLM solutions estimate effects based on single markers and require the use of covariances to account for population structure. These approaches for controlling for population structure decrease statistical

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call