Abstract
Considering as one of the major goals in quantitative proteomics, detection of the differentially expressed proteins (DEPs) plays an important role in biomarker selection and clinical diagnostics. There have been plenty of algorithms and tools focusing on DEP detection in proteomics research. However, due to the different application scopes of these methods, and various kinds of experiment designs, it is not very apparent about the best choice for large-scale proteomics data analyses. Moreover, given the fact that proteomics data usually contain high percentage of missing values (MVs), but few replicates, a systematic evaluation of the DEP detection methods combined with the MV imputation methods is essential and urgent. Here, we analyzed a total of four representative imputation methods and five DEP methods on different experimental and simulated datasets. The results showed that (i) MV imputation could not always improve the performances of DEP detection methods and the imputation effects differed in the missing value percentages; (ii) the DEP detection methods had different statistical powers affected by the percentage of MVs. Two statistical methods (i.e. the empirical Bayesian random censoring threshold model, and the significance analysis of microarray) performed better than the other evaluated methods in terms of accuracy and sensitivity.
Highlights
Due to the rapid improvement of high resolution mass spectrometers, the focus of proteomics research is changing from qualitative to quantitative analyses[1]
Even if some differentially expressed proteins (DEPs) detection methods might be applied to a dataset containing missing values (MVs), their statistical powers tend to be limited by the wide dynamic percentage of MVs in the proteomics data
Four popular imputation methods and five representative DEP detection methods were comprehensively evaluated on two experimental datasets and nine simulated datasets to answer three scientific questions: (1) What’s the maximum MV percentage of a dataset that imputation methods can handle? (2) To what extent, the imputation could affect the performances of the DEP detection methods? (3) Among the combinations of MV imputation and DEP detection methods, which one is more suitable for proteomics data?
Summary
Due to the rapid improvement of high resolution mass spectrometers, the focus of proteomics research is changing from qualitative to quantitative analyses[1]. It is of great significance to accurately determine the protein expression levels and detect DEPs in different experimental conditions (groups or samples) in quantitative proteomics. Even if some DEP detection methods might be applied to a dataset containing MVs, their statistical powers tend to be limited by the wide dynamic percentage of MVs in the proteomics data. Webb-Robertson et al.[9] has reviewed some selected imputation methods for label-free quantitative proteomics, but the influences of these imputation strategies on the subsequent DEP detection algorithms were not considered. A systematic evaluation of DEP detection methods and MV imputation methods was performed for different experimental designs containing different replicates and MV percentages. Our aim is to evaluate the statistical powers of DEP detection methods before and after MV imputation. Four popular imputation methods and five representative DEP detection methods were comprehensively evaluated on two experimental datasets and nine simulated datasets to answer three scientific questions: (1) What’s the maximum MV percentage of a dataset that imputation methods can handle? (2) To what extent, the imputation could affect the performances of the DEP detection methods? (3) Among the combinations of MV imputation and DEP detection methods, which one is more suitable for proteomics data?
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.