Abstract

BackgroundAdvances in mass spectrometry-based proteomics have enabled the incorporation of proteomic data into systems approaches to biology. However, development of analytical methods has lagged behind. Here we describe an empirical Bayes framework for quantitative proteomics data analysis. The method provides a statistical description of each experiment, including the number of proteins that differ in abundance between 2 samples, the experiment's statistical power to detect them, and the false-positive probability of each protein.Methodology/Principal FindingsWe analyzed 2 types of mass spectrometric experiments. First, we showed that the method identified the protein targets of small-molecules in affinity purification experiments with high precision. Second, we re-analyzed a mass spectrometric data set designed to identify proteins regulated by microRNAs. Our results were supported by sequence analysis of the 3′ UTR regions of predicted target genes, and we found that the previously reported conclusion that a large fraction of the proteome is regulated by microRNAs was not supported by our statistical analysis of the data.Conclusions/SignificanceOur results highlight the importance of rigorous statistical analysis of proteomic data, and the method described here provides a statistical framework to robustly and reliably interpret such data.

Highlights

  • Recent advances in mass spectrometry (MS)-based proteomics technology have enabled the investigation of proteomes at a systems level [1]

  • We motivated our approach by extending the empirical Bayes framework of Efron [28,29], which was developed in the context of gene expression analysis and overcomes the constraints of the Gaussian mixture model by allowing more flexible modeling of the data

  • Modern high-throughput technologies in experimental biology produce large-scale data sets consisting of hundreds or thousands of measurements, presenting simultaneous inference challenges not anticipated by classical statistical methods that were designed for problems with small numbers of data points and limited computational power

Read more

Summary

Introduction

Recent advances in mass spectrometry (MS)-based proteomics technology have enabled the investigation of proteomes at a systems level [1]. A critical issue that remains less welladdressed is the development of statistical models to identify biologically relevant proteins based on SILAC ratio values summarized at the protein level (e.g. the median XIC ratio for all peptides identifying a protein, generally log transformed to treat over- and under-abundance symmetrically) [17,18]. Such statistical estimates are critical since variations in relative abundance measurements arise from confounding factors such as spectral background noise, interfering signals from co-eluting peptides, differential lysis efficiencies, isotope impurities, and incomplete incorporation of the isotope label. The method provides a statistical description of each experiment, including the number of proteins that differ in abundance between 2 samples, the experiment’s statistical power to detect them, and the false-positive probability of each protein

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.