A comparative study of evaluating missing value imputation methods in label-free proteomics

Liang Jin,Shichen Shen,Yu Tian,Yingtao Bi,Jun Qu,Chenqi Hu,Xue Wang

doi:10.1038/s41598-021-81279-4

Liang Jin, Shichen Shen + Show 5 more

Open Access

https://doi.org/10.1038/s41598-021-81279-4

Copy DOI

Abstract

The presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data. Imputation has been widely utilized to handle MVs, and selection of the proper method is critical for the accuracy and reliability of imputation. Here we present a comparative study that evaluates the performance of seven popular imputation methods with a large-scale benchmark dataset and an immune cell dataset. Simulated MVs were incorporated into the complete part of each dataset with different combinations of MV rates and missing not at random (MNAR) rates. Normalized root mean square error (NRMSE) was applied to evaluate the accuracy of protein abundances and intergroup protein ratios after imputation. Detection of true positives (TPs) and false altered-protein discovery rate (FADR) between groups were also compared using the benchmark dataset. Furthermore, the accuracy of handling real MVs was assessed by comparing enriched pathways and signature genes of cell activation after imputing the immune cell dataset. We observed that the accuracy of imputation is primarily affected by the MNAR rate rather than the MV rate, and downstream analysis can be largely impacted by the selection of imputation methods. A random forest-based imputation method consistently outperformed other popular methods by achieving the lowest NRMSE, high amount of TPs with the average FADR < 5%, and the best detection of relevant pathways and signature genes, highlighting it as the most suitable method for label-free proteomics.

Highlights

The presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data
Key applications of label-free proteomics include the discovery of biomarkers and new drug targets, but a major issue is that the power of statistical inference and downstream functional analysis is greatly impacted by the presence of missing values (MVs) in the protein abundance data
Our results revealed that the random forest (RF) and local least squares (LLS) imputation methods consistently performed better than other methods, and RF slightly outperformed LLS in terms of protein ratio estimation and DE protein detection

Summary

Introduction

The presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data. The accuracy of handling real MVs was assessed by comparing enriched pathways and signature genes of cell activation after imputing the immune cell dataset. A random forest-based imputation method consistently outperformed other popular methods by achieving the lowest NRMSE, high amount of TPs with the average FADR < 5%, and the best detection of relevant pathways and signature genes, highlighting it as the most suitable method for label-free proteomics. Key applications of label-free proteomics include the discovery of biomarkers and new drug targets, but a major issue is that the power of statistical inference and downstream functional analysis is greatly impacted by the presence of missing values (MVs) in the protein abundance data. Global structure methods, have been introduced to proteomics because they can handle mixed types of MVs3,5

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Jan 19, 2021
Citations: 73	License type: open-access

R Discovery Prime

R Discovery Prime

A comparative study of evaluating missing value imputation methods in label-free proteomics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Bridging gaps in demographic analysis with phylogenetic imputation.
Tamora D James ... Dylan Z Childs
Conservation Biology | VOL. 35
Tamora D James, et. al.Tamora D James ... Dylan Z Childs
21 Jan 2021
Conservation Biology | VOL. 35

Improved GSimp: A Flexible Missing Value Imputation Method to Support Regulatory Bioequivalence Assessment.
Jing Wang ... Meng Hu
Annals of Biomedical Engineering | VOL. 51
Jing Wang, et. al.Jing Wang ... Meng Hu
15 Sep 2022
Annals of Biomedical Engineering | VOL. 51

Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia
Zulfaqar Sa’Adi ... Mohamad Faizal Ahmad
Applied Computing and Geosciences | VOL. 20
Zulfaqar Sa’Adi, et. al.Zulfaqar Sa’Adi ... Mohamad Faizal Ahmad
01 Dec 2023
Applied Computing and Geosciences | VOL. 20

Advanced methods for missing values imputation based on similarity learning.
Khaled M Fouad ... Mona M Arafa
PeerJ. Computer science | VOL. 7
Khaled M Fouad, et. al.Khaled M Fouad ... Mona M Arafa
21 Jul 2021
PeerJ. Computer science | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A comparative study of evaluating missing value imputation methods in label-free proteomics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports