Abstract
MotivationThe Rank Product (RP) is a statistical technique widely used to detect differentially expressed features in molecular profiling experiments such as transcriptomics, metabolomics and proteomics studies. An implementation of the RP and the closely related Rank Sum (RS) statistics has been available in the RankProd Bioconductor package for several years. However, several recent advances in the understanding of the statistical foundations of the method have made a complete refactoring of the existing package desirable.ResultsWe implemented a completely refactored version of the RankProd package, which provides a more principled implementation of the statistics for unpaired datasets. Moreover, the permutation-based P-value estimation methods have been replaced by exact methods, providing faster and more accurate results.Availability and implementationRankProd 2.0 is available at Bioconductor (https://www.bioconductor.org/packages/devel/bioc/html/RankProd.html) and as part of the mzMatch pipeline (http://www.mzmatch.sourceforge.net).Supplementary information Supplementary data are available at Bioinformatics online.
Highlights
Finding differentially expressed molecular features when comparing different conditions plays a pivotal role in all kinds of molecular profiling studies (“omics”)
Provided that unpaired datasets are increasingly common, we developed a more principled approach described in Section 4, which provides a more reliable application of Rank Product (RP) and Rank Sum (RS) in the analysis of unpaired datasets
We introduce a method for the exact calculation of the RS p-values. This is derived from the simple observation that under the null hypothesis, the probability distribution of the RS, in an experiment with N variables and K replicates, is exactly the same as the probability distribution of the sum of the outcomes obtained by rolling K dice with N faces
Summary
Finding differentially expressed molecular features when comparing different conditions plays a pivotal role in all kinds of molecular profiling studies (“omics”). The main identified weakness of the RP method is its sensitivity to variable-specific measurement variance This problem has been successfully addressed by a number of variance stabilizing normalization techniques (Durbin et al, 2002; Huber et al, 2002; Breitling and Herzyk, 2005). The p-value estimation had been performed by a permutation-based method for both statistics (Hong et al, 2006). This method requires a computationally demanding number of permutations in order to obtain accurate results and, when dealing with the tails of the distribution (i.e. the most interesting molecular features), the estimates are unreliable. Provided that unpaired datasets are increasingly common, we developed a more principled approach described in Section 4, which provides a more reliable application of RP and RS in the analysis of unpaired datasets
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.