Abstract

Gene set analysis is commonly used in functional enrichment and molecular pathway analyses. Most of the present methods are based on the competitive testing methods which assume each gene is independent of the others. However, the false discovery rates of competitive methods are amplified when they are applied to datasets with high inter-gene correlations. The self-contained testing methods could solve this problem, but there are other restrictions on data characteristics. Therefore, a statistically rigorous testing method applicable to different datasets with various complex characteristics is needed to obtain unbiased and comparable results. We propose a self-contained and competitive incorporated analysis (SCIA) to alleviate the bias caused by the limited application scope of existing gene set analysis methods. This is accomplished through a novel permutation strategy using a priori biological networks to selectively permute gene labels with different probabilities. In simulation studies, SCIA was compared with four representative analysis methods (GSEA, CAMERA, ROAST, and NES), and produced the best performance in both false discovery rate and sensitivity under most conditions with different parameter settings. Further, the KEGG pathway analysis on two real datasets of lung cancer showed that the results found by SCIA in both of the two datasets are much more than that of GSEA and most of them could be supported by literature. Overall, SCIA promisingly offers researchers more reliable and comparable results with different datasets.

Highlights

  • In recent years, gene set analysis (GSA) has become the most common method in functional genomics studies, because evaluating a single p-value for a gene set is statistically more powerful than genewise tests

  • The proportion of differentially expressed genes (DEGs) between a given functional gene set and the background gene set are tested by hypergeometric, binomial, or chi-square distribution. This comparison of the DEG proportions is the original theory of competitive testing

  • They assume that the gene set does not contain any genes with expression levels that are associated with different experimental conditions

Read more

Summary

INTRODUCTION

Gene set analysis (GSA) has become the most common method in functional genomics studies, because evaluating a single p-value for a gene set is statistically more powerful than genewise tests. Self-contained FCS methods seem to be more powerful than competitive ones and do not assume that all the genes are independent, but their null hypothesis is usually over restrictive (Goeman et al, 2004; Tian et al, 2005; Khatri et al, 2012) They assume that the gene set does not contain any genes with expression levels that are associated with different experimental conditions. The genes in the CSSPN are selectively permuted instead of permuting the whole gene labels as usual This procedure does not disrupt inter-gene correlations but uses inter-pathway information from a priori biological networks, which creates a platform for the incorporation of self-contained, competitive, and PT-based methods. Other comprehensive networks, such as the integrated network of seven common used networks in Edge Set Enrichment Analysis (Han et al, 2015) can be used as the background network of SCIA

C Statistic
RESULTS
DISCUSSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.