Abstract

BackgroundGene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment.MethodsWe categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets.ResultsWe used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method.ConclusionsThis study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.

Highlights

  • Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level

  • What we observe from experiments may not indicate the underlying truth. (For example, a gene with slight down-regulated differential expression may show a small positive t-type test value.) these categories are not observed in practice, they can be considered in a mixture model framework

  • We need to develop a mathematical formula for the probability of concordant enrichment score (CES) of a given gene set S that contains mS genes: CESS = Pr(gene set S is concordantly enriched|observed data), which can be useful for prioritizing different gene sets in practice

Read more

Summary

Introduction

Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. The integrative analysis of multiple gene expression data sets has been well studied in recent years [10,11], the genome-wide concordance has not been well considered. Our purpose is to identify pathways or gene sets with concordant enrichment. There are several methods published for meta gene set enrichment analysis of expression data [12,13]. These methods have not been developed for our study purpose. There is still a lack of methods and software for the concordant integrative gene set enrichment analysis

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call