A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity

Adi L Tarca,Roberto Romero,Gaurav Bhatti

doi:10.1371/journal.pone.0079217

Abstract

Identification of functional sets of genes associated with conditions of interest from omics data was first reported in 1999, and since, a plethora of enrichment methods were published for systematic analysis of gene sets collections including Gene Ontology and biological pathways. Despite their widespread usage in reducing the complexity of omics experiment results, their performance is poorly understood. Leveraging the existence of disease specific gene sets in KEGG and Metacore® databases, we compared the performance of sixteen methods under relaxed assumptions while using 42 real datasets (over 1,400 samples). Most of the methods ranked high the gene sets designed for specific diseases whenever samples from affected individuals were compared against controls via microarrays. The top methods for gene set prioritization were different from the top ones in terms of sensitivity, and four of the sixteen methods had large false positives rates assessed by permuting the phenotype of the samples. The best overall methods among those that generated reasonably low false positive rates, when permuting phenotypes, were PLAGE, GLOBALTEST, and PADOG. The best method in the category that generated higher than expected false positives was MRGSE.

Highlights

As soon as microarrays became available [1], scientists faced the challenge of interpreting the high volume of data generated from these technologies, as a typical experiment comparing two groups of samples can result in hundreds or thousands of genes being identified as differentially expressed between groups
The association between the phenotype and the sample-level gene set scores can be conducted with classical statistical models. This is an important advantage over Functional Class Scoring (FCS) methods, because very complex designs can be analyzed in this way, while adjusting for relevant covariates in the analysis. The methods in this category that we considered in this work were: Pathway Level Analysis of Gene Expression (PLAGE) [22], Z-score [23], Single Sample Gene Set Enrichment Analysis (GSEA) (SSGSEA) [24] and Gene Set Variation Analysis (GSVA) [25]
These results suggest that overall a) the gene sets designed by KEGG and Metacore were relevant to those conditions, and b) that the datasets we selected captured the nature of those phenotypes, on average

Summary

Introduction

As soon as microarrays became available [1], scientists faced the challenge of interpreting the high volume of data generated from these technologies, as a typical experiment comparing two groups of samples can result in hundreds or thousands of genes being identified as differentially expressed between groups. Even when a high-throughput experiment fails to demonstrate significant changes at gene level, due for instance to a modest effect or small sample size which are common in the field, gene set analysis is still relevant. This is because certain gene set analysis methods can use modest but coordinated changes in expression to establish a link between the phenotype and a predefined group of functionally related genes. In the Species Translation Challenge (https://www. sbvimprover.com), a large international effort for systems biology verification, the effect of various stimuli on the transcriptome was expected to be translatable in a certain proportion between rat and human organisms, at gene set level rather than at the individual gene level

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Nov 15, 2013
Citations: 178	License type: CC0 1.0

R Discovery Prime

R Discovery Prime

A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Pathways of the Heart
Rahul C Deo ... Frederick P Roth
Circulation: Cardiovascular Genetics | VOL. 2
Rahul C Deo, et. al.Rahul C Deo ... Frederick P Roth
01 Aug 2009
Circulation: Cardiovascular Genetics | VOL. 2

Specificity and False Positive Rates of the Test of Memory Malingering, Rey 15-Item Test, and Rey Word Recognition Test Among Forensic Inpatients With Intellectual Disabilities
Christopher M Love ... Shanna Jordan Zanolini
Assessment | VOL. 21
Christopher M Love, et. al.Christopher M Love ... Shanna Jordan Zanolini
26 Mar 2014
Assessment | VOL. 21

Gene Set Overlap: An Impediment to Achieving High Specificity in Over-representation Analysis
Anthony Kusalik ... Farhad Maleki
-
Anthony Kusalik, et. al.Anthony Kusalik ... Farhad Maleki
01 Jan 2019
01 Jan 2019

On the impact of model selection on predictor identification and parameter inference
Ruth M Pfeiffer ... Raymond J Carroll
Computational Statistics | VOL. 32
Ruth M Pfeiffer, et. al.Ruth M Pfeiffer ... Raymond J Carroll
22 Oct 2016
Computational Statistics | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE