Abstract

Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of genes in a genome. In gene expression study, gene set analysis has become the first choice to gain insights into the underlying biology of diseases or stresses in plants. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results from the primary downstream differential expression analysis. The gene set analysis approaches are well developed in microarrays and RNA-seq gene expression data analysis. These approaches mainly focus on analyzing the gene sets with gene ontology or pathway annotation data. However, in plant biology, such methods may not establish any formal relationship between the genotypes and the phenotypes, as most of the traits are quantitative and controlled by polygenes. The existing Quantitative Trait Loci (QTL)-based gene set analysis approaches only focus on the over-representation analysis of the selected genes while ignoring their associated gene scores. Therefore, we developed an innovative statistical approach, GSQSeq, to analyze the gene sets with trait enriched QTL data. This approach considers the associated differential expression scores of genes while analyzing the gene sets. The performance of the developed method was tested on five different crop gene expression datasets obtained from real crop gene expression studies. Our analytical results indicated that the trait-specific analysis of gene sets was more robust and successful through the proposed approach than existing techniques. Further, the developed method provides a valuable platform for integrating the gene expression data with QTL data.

Highlights

  • Gene expression (GE) studies including RNA sequencing (RNA-seq) and microarrays are powerful techniques for studying expression dynamics and regulation of genes in human and non-human genomes

  • Earlier developed Gene Set Validation with QTL (GSVQ) and Gene Set Analysis with QTL (GSAQ) approaches were based on the over representation analysis (ORA) of the Quantitative Trait Loci (QTL) hit genes in the selected gene set through hypergeometric distribution [25,26]

  • We explored the ability of the proposed GSQSeq and the existing methods, including GSVQ and GSAQ, to provide biologically meaningful insights using the real high-throughput GE datasets derived from RNA-seq and microarrays

Read more

Summary

Background

Gene expression (GE) studies including RNA sequencing (RNA-seq) and microarrays are powerful techniques for studying expression dynamics and regulation of genes in human and non-human genomes. Apart from the GO and pathways, other biological information, such as Quantitative Trait Loci (QTL), are available in public domain databases that can be effectively used in GSA to gain biological insights into the etiology of complex diseases in humans as well as other non-human organisms (e.g., plants) For this purpose, statistical approaches and tools were developed to perform GSA with genetically enriched QTL data for GE microarray studies [25]. We evaluated the proposed method’s performance with respect to the existing approaches, including GSVQ and GSAQ, through performance metrics such as False Discovery Rate (FDR) and −log10(p-value) on multiple real crop datasets For this purpose, we used five GE datasets obtained from microarrays and RNA-seq studies in rice. We implemented the proposed statistical approach in a freely available R software package for the benefit of users

Real Microarray Datasets
Pre-Processing of Rice Microarrays Datasets
Real RNA-Seq Dataset
RNA-Seq Preprocessing and Read Alignment
Transcript Assembly and Quantification
Notations
Preparation of Gene Ranked List
Proposed GSQSeq Approach
Rice RNA-Seq Data
Rice Microarray Datasets
Proposed Approach for Gene Set Analysis with QTLs
R Software Package
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call