Abstract
Introduction:The quantitative measurements based on liquid chromatography (LC) coupled with mass spectrometry (MS) often suffer from the problem of missing values and data heterogeneity from technical variability. We considered a proteomics data set generated from human kidney biopsy material to investigate the technical effects of sample preparation and the quantitative MS.Methods:We studied the effect of tissue storage methods (TSMs) and tissue extraction methods (TEMs) on data analysis. There are two TSMs: frozen (FR) and FFPE (formalin-fixed paraffin embedded); and three TEMs: MAX, TX followed by MAX and SDS followed by MAX. We assessed the impact of different strategies to analyze the data while considering heterogeneity and MVs. We have used analysis of variance (ANOVA) model to study the effects due to various sources of variability.Results and Conclusion:We found that the FFPE TSM is better than the FR TSM. We also found that the one-step TEM (MAX) is better than those of two-steps TEMs. Furthermore, we found the imputation method is a better approach than excluding the proteins with MVs or using unbalanced design.
Highlights
The quantitative measurements based on liquid chromatography (LC) coupled with mass spectrometry (MS) often suffer from the problem of missing values and data heterogeneity from technical variability
We found the imputation method is a better approach than excluding the proteins with MVs or using unbalanced design
MVs in proteomics data can occur due to biological and/or technical issues. These are of three types: (i) missing completely at random (MCAR) in which MVs are independent of both unobserved and observed data; (ii) missing at random (MAR) if conditional on the observed data, the MVs are independent of the missing measurements; and (iii) missing not at random (MNAR) when data is neither MCAR nor MAR [30]
Summary
The quantitative measurements based on liquid chromatography (LC) coupled with mass spectrometry (MS) often suffer from the problem of missing values and data heterogeneity from technical variability. Various approaches exist for proteomics data analysis in which the first step is to summarize the intensities of all features using a quantitative summary followed by some transformation such as log transformation to approximate it to normal distribution. Each of these methods has several drawbacks which can be studied by examining the statistical properties of these methods [27,28,29]. We investigated the technical effects of sample preparation and the quantitative MS resulting in heterogeneity for low abundant protein quantification This will improve the biomarker discovery studies utilizing limited bioreposited tissue resources. We have done all the statistical analysis in R [35] and codes are available from the authors on request
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.