Comparative evaluation of gene-set analysis methods

Qi Liu,Yutaka Yasui,Irina Dinu,Adeniyi J Adewale,John D Potter

doi:10.1186/1471-2105-8-431

Qi Liu, Yutaka Yasui + Show 3 more

Open Access

https://doi.org/10.1186/1471-2105-8-431

Copy DOI

Abstract

BackgroundMultiple data-analytic methods have been proposed for evaluating gene-expression levels in specific biological pathways, assessing differential expression associated with a binary phenotype. Following Goeman and Bühlmann's recent review, we compared statistical performance of three methods, namely Global Test, ANCOVA Global Test, and SAM-GS, that test "self-contained null hypotheses" Via. subject sampling. The three methods were compared based on a simulation experiment and analyses of three real-world microarray datasets.ResultsIn the simulation experiment, we found that the use of the asymptotic distribution in the two Global Tests leads to a statistical test with an incorrect size. Specifically, p-values calculated by the scaled χ2 distribution of Global Test and the asymptotic distribution of ANCOVA Global Test are too liberal, while the asymptotic distribution with a quadratic form of the Global Test results in p-values that are too conservative. The two Global Tests with permutation-based inference, however, gave a correct size. While the three methods showed similar power using permutation inference after a proper standardization of gene expression data, SAM-GS showed slightly higher power than the Global Tests. In the analysis of a real-world microarray dataset, the two Global Tests gave markedly different results, compared to SAM-GS, in identifying pathways whose gene expressions are associated with p53 mutation in cancer cell lines. A proper standardization of gene expression variances is necessary for the two Global Tests in order to produce biologically sensible results. After the standardization, the three methods gave very similar biologically-sensible results, with slightly higher statistical significance given by SAM-GS. The three methods gave similar patterns of results in the analysis of the other two microarray datasets.ConclusionAn appropriate standardization makes the performance of all three methods similar, given the use of permutation-based inference. SAM-GS tends to have slightly higher power in the lower α-level region (i.e. gene sets that are of the greatest interest). Global Test and ANCOVA Global Test have the important advantage of being able to analyze continuous and survival phenotypes and to adjust for covariates. A free Microsoft Excel Add-In to perform SAM-GS is available from .

Highlights

Multiple data-analytic methods have been proposed for evaluating gene-expression levels in specific biological pathways, assessing differential expression associated with a binary phenotype
The p-values of the three methods agreed with each other after the standardization, the p-values from Significance Analysis of Microarray (SAM)-GS tended to be smaller than those from Global Test and ANCOVA Global Test, in the lower range of p-values (Table 2, Figures 7 and 8): this is consistent with the power-comparison simulation in which SAM-GS showed slightly higher power than the Global tests at small D levels, even after the standardization
We suggest that, when Global Test and ANCOVA Global Test are used for the analysis of microarray data, permutations should always be used for the calculation of statistical significance

Summary

Introduction

Multiple data-analytic methods have been proposed for evaluating gene-expression levels in specific biological pathways, assessing differential expression associated with a binary phenotype. Following Goeman and Bühlmann's recent review, we compared statistical performance of three methods, namely Global Test, ANCOVA Global Test, and SAM-GS, that test "self-contained null hypotheses" Via. subject sampling. Some microarray-based gene expression analyses such as Significance Analysis of Microarray (SAM) [1] aim to discover individual genes whose expression levels are associated with a phenotype of interest. Such individual-gene analyses can be enhanced by utilizing existing knowledge of biological pathways, or sets of individual genes (hereafter referred to as "gene sets"), that are linked via. They argue, and we agree, that the framework of the competitive hypothesis testing via. gene sampling is subject to serious errors in calculating and interpreting statistical significance of gene sets, because of its implicit or explicit untenable assumption of probabilistic independence across genes

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Nov 7, 2007
Citations: 122	License type: cc-by

R Discovery Prime

R Discovery Prime

Comparative evaluation of gene-set analysis methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Improving gene set analysis of microarray data by SAM-GS.
Irina Dinu ... Konrad S Famulski
BMC Bioinformatics | VOL. 8
Irina Dinu, et. al.Irina Dinu ... Konrad S Famulski
05 Jul 2007
BMC Bioinformatics | VOL. 8

A comparative study on gene-set analysis methods for assessing differential expression associated with the survival phenotype
Seungyeoun Lee ... Sunho Lee
BMC Bioinformatics | VOL. 12
Seungyeoun Lee, et. al.Seungyeoun Lee ... Sunho Lee
26 Sep 2011
BMC Bioinformatics | VOL. 12

Testing Differential Gene Expression in Functional Groups
R Meister ... U Mansmann
Methods of Information in Medicine | VOL. 44
R Meister, et. al.R Meister ... U Mansmann
01 Jan 2004
Methods of Information in Medicine | VOL. 44

Linear combination test for gene set analysis of a continuous phenotype
Irina Dinu ... Saumyadipta Pyne
BMC Bioinformatics | VOL. 14
Irina Dinu, et. al.Irina Dinu ... Saumyadipta Pyne
01 Jul 2013
BMC Bioinformatics | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparative evaluation of gene-set analysis methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics