Evaluating Gene Set Enrichment Analysis via a Hybrid Data Model

Jianping Hua,Michael L Bittner,Edward R Dougherty

doi:10.4137/cin.s13305

Abstract

Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Cancer Informatics	Publication Date: Jan 1, 2014
Citations: 7	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

Evaluating Gene Set Enrichment Analysis via a Hybrid Data Model

Abstract

Talk to us

Similar Papers

More From: Cancer Informatics

Lead the way for us

Similar Papers

BAIAP2L2 facilitates the malignancy of prostate cancer (PCa) via VEGF and apoptosis signaling pathways.
Yuanzi Song ... Mingqing Zhang
Genes & genomics | VOL. 43
Yuanzi Song, et. al.Yuanzi Song ... Mingqing Zhang
01 Mar 2021
Genes & genomics | VOL. 43

Application of bi-clustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials
Andrew Williams ... Sabina Halappanavar
Data in Brief | VOL. 15
Andrew Williams, et. al.Andrew Williams ... Sabina Halappanavar
26 Oct 2017
Data in Brief | VOL. 15

A comparative study on gene-set analysis methods for assessing differential expression associated with the survival phenotype
Seungyeoun Lee ... Sunho Lee
BMC Bioinformatics | VOL. 12
Seungyeoun Lee, et. al.Seungyeoun Lee ... Sunho Lee
26 Sep 2011
BMC Bioinformatics | VOL. 12

PAGE: Parametric Analysis of Gene Set Enrichment
Seon-Young Kim ... David J Volsky
BMC Bioinformatics | VOL. 6
Seon-Young Kim, et. al.Seon-Young Kim ... David J Volsky
01 Jan 2004
BMC Bioinformatics | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating Gene Set Enrichment Analysis via a Hybrid Data Model

Abstract

Talk to us

Similar Papers

More From: Cancer Informatics