Abstract

Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative methods targeting three typical missing mechanisms. These methods are compared on both simulated and masked missing values embedded within real proteomics datasets, and performance is evaluated using three quantitative measures. We then introduce fused regularization matrix factorization, a low-rank global matrix factorization framework, capable of integrating local similarity derived from additional data types. We also explore a biologically-inspired latent variable modeling strategy—convex analysis of mixtures—for missing value imputation and present preliminary experimental results. While some winners emerged from our comparative assessment, the evaluation is intrinsically imperfect because performance is evaluated indirectly on artificial missing or masked values not authentic missing values. Nevertheless, we show that our fused regularization matrix factorization provides a novel incorporation of external and local information, and the exploratory implementation of convex analysis of mixtures presents a biologically plausible new approach.

Highlights

  • Introduction to Fused Regularization Matrix factorization (FRMF) methodLow-rank matrix factorization is a popular and effective approach for missing data ­imputation[13]

  • We explored and tested several variants of FRMF and Convex Analysis of Mixtures (CAM), where local similarity information is obtained from baseline or other data acquired from the same samples

  • In simulation setting 1, the simulation data were generated from the observed data portion of a real proteomics dataset, where artificial missing values were introduced by two typical missing mechanisms and used for performance assessment

Read more

Summary

Introduction

Introduction to FRMF methodLow-rank matrix factorization is a popular and effective approach for missing data ­imputation[13]. CAM is a latent variable modeling and deconvolution technique previously used for identifying biologically-interpretable cell subtypes or biological archetypes Sl×n and their composition Am×l in complex tissue ­ecosystems[6,14,15,21]. The functions of complex tissues are orchestrated by a productive interplay among many specialized cell subtypes or task a­ rchetypes[33] These biological components interact with each other to create a unique physiological or pathophysiological state. To generate a peptide spectral library for subsequent identification and quantification of peptides and proteins, peptides from representative specimen were pooled and separated into 80 basic reverse phase fractions These were analyzed by DDA mass spectrometry analysis for the assembly of a human vascular peptide assay library. To facilitate rapid transition from discovery into translation, we employed DIA-MS, for which a targeted peptide peak group readily exists for each peptide identified from a protein of interest detected in discovery p­ hase[21]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.