Abstract
BackgroundGene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment.MethodsWe categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets.ResultsWe used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method.ConclusionsThis study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.
Highlights
Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level
What we observe from experiments may not indicate the underlying truth. (For example, a gene with slight down-regulated differential expression may show a small positive t-type test value.) these categories are not observed in practice, they can be considered in a mixture model framework
We need to develop a mathematical formula for the probability of concordant enrichment score (CES) of a given gene set S that contains mS genes: CESS = Pr(gene set S is concordantly enriched|observed data), which can be useful for prioritizing different gene sets in practice
Summary
Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. The integrative analysis of multiple gene expression data sets has been well studied in recent years [10,11], the genome-wide concordance has not been well considered. Our purpose is to identify pathways or gene sets with concordant enrichment. There are several methods published for meta gene set enrichment analysis of expression data [12,13]. These methods have not been developed for our study purpose. There is still a lack of methods and software for the concordant integrative gene set enrichment analysis
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.