Abstract

BackgroundCompetitive gene set analysis is a standard exploratory tool for gene expression data. Permutation-based competitive gene set analysis methods are preferable to parametric ones because the latter make strong statistical assumptions which are not always met. For permutation-based methods, we permute samples, as opposed to genes, as doing so preserves the inter-gene correlation structure. Unfortunately, up until now, sample permutation-based methods have required a minimum of six replicates per sample group.ResultsWe propose a new permutation-based competitive gene set analysis method for multi-group gene expression data with as few as three replicates per group. The method is based on advanced sample permutation technique that utilizes all groups within a data set for pairwise comparisons. We present a comprehensive evaluation of different permutation techniques, using multiple data sets and contrast the performance of our method, mGSZm, with other state of the art methods. We show that mGSZm is robust, and that, despite only using less than six replicates, we are able to consistently identify a high proportion of the top ranked gene sets from the analysis of a substantially larger data set. Further, we highlight other methods where performance is highly variable and appears dependent on the underlying data set being analyzed.ConclusionsOur results demonstrate that robust gene set analysis of multi-group gene expression data is permissible with as few as three replicates. In doing so, we have extended the applicability of such approaches to resource constrained experiments where additional data generation is prohibitively difficult or expensive. An R package implementing the proposed method and supplementary materials are available from the website http://ekhidna.biocenter.helsinki.fi/downloads/pashupati/mGSZm.html.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1403-0) contains supplementary material, which is available to authorized users.

Highlights

  • Competitive gene set analysis is a standard exploratory tool for gene expression data

  • Compared gene set analysis methods We evaluated mGSZm together with several gene set analysis methods from the literature: Generally applicable gene-set enrichment (GAGE) (Generally Applicable Gene-set Enrichment) [21], Correlation adjusted mean rank gene set test (CAMERA) (Correlation Adjusted MEan RAnk gene set test) [13], Quantitative set analysis of gene expression (QuSAGE) (Quantitative Set Analysis of Gene Expression) [22]) and Allez [14] (Table 1)

  • Note that we have shown improved performance of Modified gene set analysis (mGSA) compared to Gene set analysis (GSA) in our previous article [16]. Weighted Kolmogorov-Smirnov (wKS) is our version of gene set analysis method, Gene set enrichment analysis (GSEA) [12]

Read more

Summary

Introduction

Competitive gene set analysis is a standard exploratory tool for gene expression data. Permutation-based competitive gene set analysis methods are preferable to parametric ones because the latter make strong statistical assumptions which are not always met. While sample permutation-based approaches do not make such strong assumptions, large number of permutations are necessary for accurate P-value estimation. It has recently been shown by Mishra et al that the popular choice of 1000 permutations is inadequate and results in a loss of precision visible in the tail-end of the gene set score distribution [16]. In addition to being inaccurate, the relative ranking of gene sets with the same P-value is arbitrary

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call