Abstract

Motivation: Much research effort has been devoted to the identification of enriched gene sets for microarray experiments. However, identified gene sets are often found to be inconsistent among independent studies. This is probably owing to the noisy data of microarray experiments coupled with small sample sizes of individual studies. Therefore, combining information from multiple studies is likely to improve the detection of truly enriched gene classes. As more and more data become available, it calls for statistical methods to integrate information from multiple studies, also known as meta-analysis, to improve the power of identifying enriched gene sets.Results: We propose a Bayesian model that provides a coherent framework for joint modeling of both gene set information and gene expression data from multiple studies, to improve the detection of enriched gene sets by leveraging information from different sources available. One distinct feature of our method is that it directly models the gene expression data, instead of using summary statistics, when synthesizing studies. Besides, the proposed model is flexible and offers an appropriate treatment of between-study heterogeneities that frequently arise in the meta-analysis of microarray experiments. We show that under our Bayesian model, the full posterior conditionals all have known distributions, which greatly facilitates the MCMC computation. Simulation results show that the proposed method can improve the power of gene set enrichment meta-analysis, as opposed to existing methods developed by Shen and Tseng (2010, Bioinformatics, 26, 1316–1323), and it is not sensitive to mild or moderate deviations from the distributional assumption for gene expression data. We illustrate the proposed method through an application of combining eight lung cancer datasets for gene set enrichment analysis, which demonstrates the usefulness of the method.Availability: http://qbrc.swmed.edu/software/Contact: Min.Chen@UTSouthwestern.eduSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • Microarray analysis initially focused on identifying differentially expressed (DE) genes, increased attention has been paid to pathway or gene set analysis, which aims to detect altered biological pathways or other pre-defined gene classes rather than individual genes (e.g. Barry et al, 2008; Efron and Tibshirani, 2007; Hosack et al, 2003; Kim and Volsky, 2005; Subramaniana et al, 2005; Tian et al, 2005)

  • We proceed to apply the MAPEs with two other available options: the minimum P-value statistic and the Fisher’s statistic (Fisher), and we find the meta-analysis for pathway enrichment integrated (MAPE_I) method appears to perform better than or comparable with meta-analysis for pathway enrichment at pathway level (MAPE_P) and meta-analysis for pathway enrichment at gene level (MAPE_G) in identifying positive controls while avoiding negative controls

  • We have proposed a fully integrated Bayesian model for meta-analysis of gene set enrichment using multiple genomic studies, and developed an efficient Gibbs sampler for posterior computation and inference, where all the steps can be done by direct sampling from known distributions

Read more

Summary

INTRODUCTION

Microarray analysis initially focused on identifying differentially expressed (DE) genes, increased attention has been paid to pathway or gene set analysis, which aims to detect altered biological pathways or other pre-defined gene classes rather than individual genes (e.g. Barry et al, 2008; Efron and Tibshirani, 2007; Hosack et al, 2003; Kim and Volsky, 2005; Subramaniana et al, 2005; Tian et al, 2005). Plenty of gene expression data are publicly available it is challenging to integrate information of gene set enrichment analysis from multiple genomic studies targeting the same biological problem. Our Bayesian method provides a natural way for data synthesis by incorporating model and parameter uncertainties involved in all studies It furnishes an integrated Bayesian framework for jointly modeling gene expression data from multiple studies and gene set information. This will allow researchers to conduct differential expression analysis, gene set enrichment analysis and meta-analysis simultaneously, all based on objective Bayesian posterior inference, which may yield more reliable scientific findings than the existing sequential approaches.

SIMULATION
Simulation I
Simulation II
Simulation III
DATA EXAMPLE
DISCUSSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.