A fast algorithm for determining bounds and accurate approximate p -values of the rank product statistic for replicate experiments

Tom Heskes,Rob Eisinga,Rainer Breitling

doi:10.1186/preaccept-1857144210135244

Tom Heskes, Rob Eisinga

Open Access

https://doi.org/10.1186/preaccept-1857144210135244

Copy DOI

Abstract

The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution. We derive strict lower and upper bounds to the exact p-value along with an accurate approximation that can be used to assess the significance of the rank product statistic in a computationally fast manner. The bounds and the proposed approximation are shown to provide far better accuracy over existing approximate methods in determining tail probabilities, with the slightly conservative upper bound protecting against false positives. We illustrate the proposed method in the context of a recently published analysis on transcriptomic profiling performed in blood. We provide a method to determine upper bounds and accurate approximate p-values of the rank product statistic. The proposed algorithm provides an order of magnitude increase in throughput as compared with current approaches and offers the opportunity to explore new application domains with even larger multiple testing issue. The R code is published in one of the Additional files and is available at http://www.ru.nl/publish/pages/726696/rankprodbounds.zip .

Highlights

The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments
Time performance and accuracy The R program computes the bounds and the geometric mean p-value approximation at a very fast speed. It takes approximately 2 milliseconds to calculate the upper bound p-value of any rank product ρ in the range 1 to nk, for n = 10000 and k = 4, on a HP desktop computer using the interpreted R language running under Windows 7 with an Intel Core i7 CPU at 2.9 GHz
The proposed algorithm runs very fast and gives a slightly conservative upper bound protecting against false positives and a close approximate estimate of the true p-values

Summary

Introduction

The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. A simple and widely used non-parametric statistical method, initially introduced by Breitling et al [1] for gene expression microarrays, is to rank the molecules within each experiment in order of evidence for differential expression and to calculate the product of the ranks across experiments. The rank product method is used to combine ranked lists in gene expression profiling and in various other postgenomic datasets with ranked scores, including proteomics and metabolomics [6,7,8] Such ranking is important because only a limited number of candidate molecules (transcripts or proteins or metabolites) can usually be followed up in a typical biological downstream analysis for confirmation or further study. As a useful side effect of this feature, the rank product test becomes increasingly conservative as larger fractions of the set of molecules studied are differentially expressed: if all molecules are changing to the same extent, their rank ordering will again be random

Objectives

Methods

Results

Conclusion