Identification of Significant Features by the Global Mean Rank Test

Martin Klammer,J Nikolaj Dybowski,Christoph Schaab,Daniel Hoffmann

doi:10.1371/journal.pone.0104504

Abstract

With the introduction of omics-technologies such as transcriptomics and proteomics, numerous methods for the reliable identification of significantly regulated features (genes, proteins, etc.) have been developed. Experimental practice requires these tests to successfully deal with conditions such as small numbers of replicates, missing values, non-normally distributed expression levels, and non-identical distributions of features. With the MeanRank test we aimed at developing a test that performs robustly under these conditions, while favorably scaling with the number of replicates. The test proposed here is a global one-sample location test, which is based on the mean ranks across replicates, and internally estimates and controls the false discovery rate. Furthermore, missing data is accounted for without the need of imputation. In extensive simulations comparing MeanRank to other frequently used methods, we found that it performs well with small and large numbers of replicates, feature dependent variance between replicates, and variable regulation across features on simulation data and a recent two-color microarray spike-in dataset. The tests were then used to identify significant changes in the phosphoproteomes of cancer cells induced by the kinase inhibitors erlotinib and 3-MB-PP1 in two independently published mass spectrometry-based studies. MeanRank outperformed the other global rank-based methods applied in this study. Compared to the popular Significance Analysis of Microarrays and Linear Models for Microarray methods, MeanRank performed similar or better. Furthermore, MeanRank exhibits more consistent behavior regarding the degree of regulation and is robust against the choice of preprocessing methods. MeanRank does not require any imputation of missing values, is easy to understand, and yields results that are easy to interpret. The software implementing the algorithm is freely available for academic and commercial use.

Highlights

IntroductionOmics-technologies are capable of generating vast amounts of data. Typical microarray experiments measure the abundance of thousands of features
Today, omics-technologies are capable of generating vast amounts of data
Simulated data In order to evaluate the performance of the MeanRank test and to compared it with various other tests, we performed an extensive simulation study extending the range of scenarios found in comparable publications [10,13,14] by including more parameters and wider ranges of replicates and methods

Summary

Introduction

Omics-technologies are capable of generating vast amounts of data. Typical microarray experiments measure the abundance of thousands of features. With recent advances in the field of mass spectrometry (MS), over 10,000 proteins can currently be measured in cell systems [1], while recent studies identified even more phosphorylation sites through quantitative phosphoproteomics [2,3,4]. Many of these microarray and proteomics studies include the detection of differentially regulated features as core step in the data analysis. For data with thousands of features, the false discovery rate (FDR), defined as the expected number of false positive features among those reported as significant, has to be controlled [5]. Microarray technologies often produce non-normally distributed expression levels and non-identical distributions between genes [6]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Aug 13, 2014
Citations: 37	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Identification of Significant Features by the Global Mean Rank Test

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Analysis of Differential Gene Expression Based on Bayesian Estimation of Variance
Jiyuan An ... Colleen C Nelson
Current Bioinformatics | VOL. 11
Jiyuan An, et. al.Jiyuan An ... Colleen C Nelson
01 Apr 2016
Current Bioinformatics | VOL. 11

Systematic Evaluation of Gene Expression Data Analysis Methods Using Benchmark Data
Henry Yang
-
Henry YangHenry Yang
01 Jan 2015
01 Jan 2015

The Identification of Significant Time-Domain Features for Wink-Based EEG Signals
Tang Jin Cheng ... Jothi Letchumy Mahendra Kumar
-
Tang Jin Cheng, et. al.Tang Jin Cheng ... Jothi Letchumy Mahendra Kumar
16 Jul 2021
16 Jul 2021

The Classification of Wink-Based EEG Signals: The Identification of Significant Time-Domain Features
Jothi Letchumy Mahendra Kumar ... Mohd Azraai Mohd Razman
-
Jothi Letchumy Mahendra Kumar, et. al.Jothi Letchumy Mahendra Kumar ... Mohd Azraai Mohd Razman
06 Aug 2020
06 Aug 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identification of Significant Features by the Global Mean Rank Test

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE