Abstract

BackgroundMissing values are a key issue in the statistical analysis of proteomic data. Defining the strategy to address missing values is a complex task in each study, potentially affecting the quality of statistical analyses.ResultsWe have developed OptiMissP, a dashboard to visually and qualitatively evaluate missingness and guide decision making in the handling of missing values in proteomics studies that use data-independent acquisition mass spectrometry. It provides a set of visual tools to retrieve information about missingness through protein densities and topology-based approaches, and facilitates exploration of different imputation methods and missingness thresholds.ConclusionsOptiMissP provides support for researchers’ and clinicians’ qualitative assessment of missingness in proteomic datasets in order to define study-specific strategies for the handling of missing values. OptiMissP considers biases in protein distributions related to the choice of imputation method and helps analysts to balance the information loss caused by low missingness thresholds and the noise introduced by selecting high missingness thresholds. This is complemented by topological data analysis which provides additional insight to the structure of the data and their missingness. We use an example in Chronic Kidney Disease to illustrate the main functionalities of OptiMissP.

Highlights

  • Proteomics can provide a comprehensive protein profiling of clinical and biological research samples

  • We have developed OptiMissP, a dashboard to visually and qualitatively evaluate missingness and guide decision making in the handling of missing values in proteomics studies that use data-independent acquisition mass spectrometry

  • It provides a set of visual tools to retrieve information about missingness through protein densities and topology-based approaches, and facilitates exploration of different imputation methods and missingness thresholds

Read more

Summary

Introduction

Proteomics can provide a comprehensive protein profiling of clinical and biological research samples. This has enabled the discovery of proteomic biomarkers for patient stratification and disease activity. Sequential window acquisition of all the theoretical mass spectra (SWATH-MS) has enabled these advances as this data-independent acquisition (DIA) mass spectrometry technique guarantees high reproducibility in peptide identification and identifies more peptides than data-dependent methods [1]. It is known that less abundant peptides are harder to detect with data-dependent acquisition analysis, and they are more likely to be missing [2]; there is significant literature on strategies to handle these [3,4,5]. Missing values are a key issue in the statistical analysis of proteomic data.

Methods
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.