Abstract

BackgroundCurrent methods for gene-set or pathway analysis are usually designed to test the enrichment of a single gene-set. Once the analysis is carried out for each of the sets under study, a list of significant sets can be obtained. However, if one wishes to further prioritize the importance or strength of association of these sets, no such quantitative measure is available. Using the magnitude of p-value to rank the pathways may not be appropriate because p-value is not a measure for strength of significance. In addition, when testing each pathway, these analyses are often implicitly affected by the number of differentially expressed genes included in the set and/or affected by the dependence among genes.ResultsHere we propose a two-stage procedure to prioritize the pathways/gene-sets. In the first stage we develop a pathway-level measure with three properties. First, it contains all genes (differentially expressed or not) in the same set, and summarizes the collective effect of all genes per sample. Second, this pathway score accounts for the correlation between genes by synchronizing their correlation directions. Third, the score includes a rank transformation to enhance the variation among samples as well as to avoid the influence of extreme heterogeneity among genes. In the second stage, all scores are included simultaneously in a Bayesian logistic regression model which can evaluate the strength of association for each set and rank the sets based on posterior probabilities. Simulations from Gaussian distributions and human microarray data, and a breast cancer study with RNA-Seq are considered for demonstration and comparison with other existing methods.ConclusionsThe proposed summary pathway score provides for each sample an overall evaluation of gene expression in a gene-set. It demonstrates the advantages of including all genes in the set and the synchronization of correlation direction. The simultaneous utilization of all pathway-level scores in a Bayesian model not only offers a probabilistic evaluation and ranking of the pathway association but also presents good accuracy in identifying the top-ranking pathways. The resulting recommendation list of ranked pathways can be a reference for potential target therapy or for future allocation of research resources.

Highlights

  • Current methods for gene-set or pathway analysis are usually designed to test the enrichment of a single gene-set

  • Simulation settings In the following simulation studies, we compare the Bayesian approach with other methods such as gene-set enrichment analysis (GSEA), over-representation analysis (ORA), global test, frequentist logistic regression with the proposed pathway score (denoted as Logistic), frequentist logistic regression with the average expression level as the pathway score (Logistic), and the Fisher’s method

  • The disease status was determined based on the logistic regression model described above with the intercept β0 set at 0.01 for a prevalence of 1% and all other regression coefficients set at 0 for no association, or at other given values if association is assumed

Read more

Summary

Introduction

Current methods for gene-set or pathway analysis are usually designed to test the enrichment of a single gene-set. Genes in the same pathway are often considered independent in several GSAs; while they can correlate with each other because they participate in the same or related biological functions [10–12] This correlation can inflate type I error rates and reduce power of both univariate and multivariate tests [8, 13, 14]. Another issue of concern is the condition on genes to be included in GSAs. Some analyses including ORA utilize only genes that are differentially expressed (DE), while excluding those exerting mild or weak effect. As pointed out by Rahmatallah and colleagues [9], the power of a gene-set analysis may be influenced by the number of DE genes in that set

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call