Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets

Yinglei Lai,Fanni Zhang,Timothy A Mccaffrey,Tapan K Nayak,Norman H Lee,Reza Modarres

doi:10.1186/1471-2164-15-s1-s6

Yinglei Lai, Fanni Zhang + Show 4 more

Open Access

https://doi.org/10.1186/1471-2164-15-s1-s6

Copy DOI

Abstract

BackgroundGene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment.MethodsWe categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets.ResultsWe used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method.ConclusionsThis study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.

Highlights

Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level
What we observe from experiments may not indicate the underlying truth. (For example, a gene with slight down-regulated differential expression may show a small positive t-type test value.) these categories are not observed in practice, they can be considered in a mixture model framework
We need to develop a mathematical formula for the probability of concordant enrichment score (CES) of a given gene set S that contains mS genes: CESS = Pr(gene set S is concordantly enriched|observed data), which can be useful for prioritizing different gene sets in practice

Summary

Introduction

Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. The integrative analysis of multiple gene expression data sets has been well studied in recent years [10,11], the genome-wide concordance has not been well considered. Our purpose is to identify pathways or gene sets with concordant enrichment. There are several methods published for meta gene set enrichment analysis of expression data [12,13]. These methods have not been developed for our study purpose. There is still a lack of methods and software for the concordant integrative gene set enrichment analysis

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Jan 1, 2014
Citations: 43	License type: cc-by

R Discovery Prime

R Discovery Prime

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Extracting a low-dimensional description of multiple gene expression datasets reveals a potential driver for tumor-associated stroma in ovarian cancer.
Safiye Celik ... Benjamin A Logsdon
Genome Medicine | VOL. 8
Safiye Celik, et. al.Safiye Celik ... Benjamin A Logsdon
10 Jun 2016
Genome Medicine | VOL. 8

Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes.
Patrick Warnat ... Roland Eils
BMC Bioinformatics | VOL. 6
Patrick Warnat, et. al.Patrick Warnat ... Roland Eils
04 Nov 2005
BMC Bioinformatics | VOL. 6

BubbleGUM: automatic extraction of phenotype molecular signatures and comprehensive visualization of multiple Gene Set Enrichment Analyses.
Lionel Spinelli ... Sabrina Carpentier
BMC Genomics | VOL. 16
Lionel Spinelli, et. al.Lionel Spinelli ... Sabrina Carpentier
19 Oct 2015
BMC Genomics | VOL. 16

Identification and validation of EPHX2 as a prognostic biomarker in hepatocellular carcinoma.
Ke Zhan ... Lili Kuang
Molecular medicine reports | VOL. 24
Ke Zhan, et. al.Ke Zhan ... Lili Kuang
13 Jul 2021
Molecular medicine reports | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics