Abstract

BackgroundGene set analysis is a valuable tool to summarize high-dimensional gene expression data in terms of biologically relevant sets. This is an active area of research and numerous gene set analysis methods have been developed. Despite this popularity, systematic comparative studies have been limited in scope.MethodsIn this study we present a semi-synthetic simulation study using real datasets in order to test and compare commonly used methods.ResultsA software pipeline, Flexible Algorithm for Novel Gene set Simulation (FANGS) develops simulated data based on a prostate cancer dataset where the KRAS and TGF-β pathways were differentially expressed. The FANGS software is compatible with other datasets and pathways. Comparisons of gene set analysis methods are presented for Gene Set Enrichment Analysis (GSEA), Significance Analysis of Function and Expression (SAFE), sigPathway, and Correlation Adjusted Mean RAnk (CAMERA) methods. All gene set analysis methods are tested using gene sets from the MSigDB knowledge base. The false positive rate and power are estimated and presented for comparison. Recommendations are made for the utility of the default settings of methods and each method’s sensitivity towards various effect sizes.ConclusionsThe results of this study provide empirical guidance to users of gene set analysis methods. The FANGS software is available for researchers for continued methods comparisons.

Highlights

  • Gene expression data, especially at the whole genome level, is a powerful tool in modern genomics

  • We have developed software to create semi-synthetic simulations based on real data to compare the performance of some of the most popular pathway analysis methods

  • Statistical power for all Gene-set analysis (GSA) methods tested under the default settings from the ischemic stroke dataset with the MARTORIATI_MDM4_TARGETS_NEUROEPITHELIUM_UP pathway targeted for differential expression

Read more

Summary

Introduction

Especially at the whole genome level, is a powerful tool in modern genomics. The resultant gene sets are analyzed as a whole to determine which of these properties are relevant to the phenotype of interest Such an analysis typically strives to generate hypotheses on the mechanistic processes for the phenotype of interest, which should be further validated in replication studies or functionally interrogated in laboratory experiments. A number of GSA methods have been developed for gene expression data, and have led to novel biological hypotheses about important clinical conditions These methods have suggested new avenues for therapeutic intervention on the basis of the unexpected involvement of biological functions and pathways in a variety of disease processes. Gene set analysis is a valuable tool to summarize high-dimensional gene expression data in terms of biologically relevant sets This is an active area of research and numerous gene set analysis methods have been developed. Systematic comparative studies have been limited in scope

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call