Abstract
Mass spectrometry (MS) is frequently used for proteomic and metabolomic profiling of biological samples. Data obtained by MS are often zero-inflated. Those zero values are called point mass values (PMVs). Zero values can be further grouped into biological PMVs and technical PMVs. The former type is caused by true absence of a compound and the later type is caused by a technical detection limit. Methods based on a mixture model have been developed to separate the two types of zeros and to perform differential abundance analysis comparing proteomic/metabolomic profiles between different groups of subjects. However, we notice that those methods may give unstable estimate of the model variance, and thus lead to false positive and false negative results when the number of non-zero values is small. In this paper, we propose a new differential abundance analysis method, DASEV, which uses an empirical Bayes shrinkage method to more robustly estimate the variance and enhance the accuracy of differential abundance analysis. Simulation studies and real data analysis show that DASEV substantially improves parameter estimation of the mixture model and outperforms current methods in identifying differentially abundant features.
Highlights
In recent years, many proteomic and metabolomic studies have been performed to understand diseases’ biological mechanisms, to identify prognostic and predictive biomarkers, and to develop better treatments[1,2]
The mixture model is appealing in distinguishing technical PMVs (TPMVs) and biological PMVs (BPMVs) and providing better characterization of Mass spectrometry (MS) data, parameter estimations, especially the variance paramter estimation, from the model are unstable in presence of large proportion of zero values
+ (1 − pik)Φ{(λk − μik)/σk2}, − pik)φ[{log(Yik) − μik}/σk2], if point mass values (PMVs) if non − PMVs where pik is the proportion of BPMVs and μik is the mean of non-BPMVs for feature k in subject i, σk is the standard deviation, λk is the logarithm of the detection limit for feature k, and Φ and φ are cumulative distribution and density functions of a standard normal distribution, respectively
Summary
Many proteomic and metabolomic studies have been performed to understand diseases’ biological mechanisms, to identify prognostic and predictive biomarkers, and to develop better treatments[1,2]. To demonstrate the impact of underestimating the variance on differential abundance analysis, Fig. 1a shows the top-ranked 150 features based on the mixture model in Taylor et al.[5] (referred to as the TLK method) from a simulated two-group comparison dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.