Abstract

BackgroundMany gene-set analysis methods have been previously proposed and compared through simulation studies and analysis of real datasets for binary phenotypes. We focused on the survival phenotype and compared the performances of Gene Set Enrichment Analysis (GSEA), Global Test (GT), Wald-type Test (WT) and Global Boost Test (GBST) methods in a simulation study and on two ovarian cancer data sets. We considered two versions of GSEA by allowing different weights: GSEA1 uses equal weights, yielding results similar to the Kolmogorov-Smirnov test; while GSEA2's weights are based on the correlation between genes and the phenotype.ResultsWe compared GSEA1, GSEA2, GT, WT and GBST in a simulation study with various settings for the correlation structure of the genes and the association parameter between the survival outcome and the genes. Simulation results indicated that GT, WT and GBST consistently have higher power than GSEA1 and GSEA2 across all scenarios. However, the power of the five tests depends on the combination of correlation structure and association parameter. For the ovarian cancer data set, using the FDR threshold of q < 0.1, the GT, WT and GBST detected 12, 6 and 8 significant pathways, respectively, whereas neither GSEA1 nor GSEA2 detected any significant pathways. In addition, among the pathways found significant by GT, WT, and GBST, three pathways - Purine metabolism, Leukocyte transendothelial migration and Jak-STAT signaling pathway - overlapped with those reported in previous ovarian cancer microarray studies.ConclusionSimulation studies and a real data example indicate that GT, WT and GBST tend to have high power, whereas GSEA1 and GSEA2 have lower power. We also found that the power of the five tests is much higher when genes are correlated than when genes are independent, when survival is positively associated with genes. It seems that there is a synergistic effect in detecting significant gene sets when significant genes have within-class correlation and the association between survival and genes is positive or negative (i.e., one-direction correlation).

Highlights

  • Many gene-set analysis methods have been previously proposed and compared through simulation studies and analysis of real datasets for binary phenotypes

  • We focused on the survival phenotype and compared five different gene-set analysis tests, GSEA1, GSEA2, Global Test (GT), Wald-type Test (WT) and Global Boost Test (GBST), in a simulation study and on two ovarian cancer data sets

  • We found that GT, WT, and GBST are more powerful than GSEA1 and GSEA2

Read more

Summary

Introduction

Many gene-set analysis methods have been previously proposed and compared through simulation studies and analysis of real datasets for binary phenotypes. Mootha et al [1] proposed Gene Set Enrichment Analysis (GSEA), based on the Kolmogorov-Smirnov statistic, which measures the maximum degree of differential gene expression in a gene set across a binary phenotype. Dinu et al [4] described some critical problems with GSEA, and proposed an alternative method by extending an individual gene analysis method, Significance Analysis of Microarrays, to geneset analysis (SAM-GS). They compared SAM-GS to GSEA using a mouse microarray dataset with simulated gene sets, and showed an advantage to SAM-GS over GSEA in the analysis of three real microarray datasets

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.