Abstract

MotivationThe Rank Product (RP) is a statistical technique widely used to detect differentially expressed features in molecular profiling experiments such as transcriptomics, metabolomics and proteomics studies. An implementation of the RP and the closely related Rank Sum (RS) statistics has been available in the RankProd Bioconductor package for several years. However, several recent advances in the understanding of the statistical foundations of the method have made a complete refactoring of the existing package desirable.ResultsWe implemented a completely refactored version of the RankProd package, which provides a more principled implementation of the statistics for unpaired datasets. Moreover, the permutation-based P-value estimation methods have been replaced by exact methods, providing faster and more accurate results.Availability and implementationRankProd 2.0 is available at Bioconductor (https://www.bioconductor.org/packages/devel/bioc/html/RankProd.html) and as part of the mzMatch pipeline (http://www.mzmatch.sourceforge.net).Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Finding differentially expressed molecular features when comparing different conditions plays a pivotal role in all kinds of molecular profiling studies (“omics”)

  • Provided that unpaired datasets are increasingly common, we developed a more principled approach described in Section 4, which provides a more reliable application of Rank Product (RP) and Rank Sum (RS) in the analysis of unpaired datasets

  • We introduce a method for the exact calculation of the RS p-values. This is derived from the simple observation that under the null hypothesis, the probability distribution of the RS, in an experiment with N variables and K replicates, is exactly the same as the probability distribution of the sum of the outcomes obtained by rolling K dice with N faces

Read more

Summary

Introduction

Finding differentially expressed molecular features when comparing different conditions plays a pivotal role in all kinds of molecular profiling studies (“omics”). The main identified weakness of the RP method is its sensitivity to variable-specific measurement variance This problem has been successfully addressed by a number of variance stabilizing normalization techniques (Durbin et al, 2002; Huber et al, 2002; Breitling and Herzyk, 2005). The p-value estimation had been performed by a permutation-based method for both statistics (Hong et al, 2006). This method requires a computationally demanding number of permutations in order to obtain accurate results and, when dealing with the tails of the distribution (i.e. the most interesting molecular features), the estimates are unreliable. Provided that unpaired datasets are increasingly common, we developed a more principled approach described in Section 4, which provides a more reliable application of RP and RS in the analysis of unpaired datasets

P-values estimation for the Rank Product
P-values estimation for the Rank Sum
Application to unpaired datasets
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.