Abstract

Comparative proteomics is a powerful analytical method for learning about the responses of biological systems to changes in growth parameters. To make confident inferences about biological responses, proteomics approaches must incorporate appropriate statistical measures of quantitative data. In the present work we applied microarray-based normalization and statistical analysis (significance testing) methods to analyze quantitative proteomics data generated from the metabolic labeling of a marine bacterium (Sphingopyxis alaskensis). Quantitative data were generated for 1,172 proteins, representing 1,736 high confidence protein identifications (54% genome coverage). To test approaches for normalization, cells were grown at a single temperature, metabolically labeled with (14)N or (15)N, and combined in different ratios to give an artificially skewed data set. Inspection of ratio versus average (MA) plots determined that a fixed value median normalization was most suitable for the data. To determine an appropriate statistical method for assessing differential abundance, a -fold change approach, Student's t test, unmoderated t test, and empirical Bayes moderated t test were applied to proteomics data from cells grown at two temperatures. Inverse metabolic labeling was used with multiple technical and biological replicates, and proteomics was performed on cells that were combined based on equal optical density of cultures (providing skewed data) or on cell extracts that were combined to give equal amounts of protein (no skew). To account for arbitrarily complex experiment-specific parameters, a linear modeling approach was used to analyze the data using the limma package in R/Bioconductor. A high quality list of statistically significant differentially abundant proteins was obtained by using lowess normalization (after inspection of MA plots) and applying the empirical Bayes moderated t test. The approach also effectively controlled for the number of false discoveries and corrected for the multiple testing problem using the Storey-Tibshirani false discovery rate (Storey, J. D., and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. U.S.A. 100, 9440-9445). The approach we have developed is generally applicable to quantitative proteomics analyses of diverse biological systems.

Highlights

  • Comparative proteomics is a powerful analytical method for learning about the responses of biological systems to changes in growth parameters

  • A Technical Replicate Data Set—To assess whether microarray-based analysis methods would be suitable for the analysis of LC-MS data and whether our system was sensitive enough to detect subtle differences in protein abundance, S. alaskensis cultures were grown at 30 °C, labeled with either 14N or 15N, and combined in known ratios of 0.8:1, 1:1, and 1.2:1, in triplicate

  • These revealed that the majority of proteins are consistent with a normal distribution; they are heavy tailed. These heavy tails most likely reflect proteins that are naturally variable in their abundance. This pattern of protein abundance is similar to two-color microarray analysis of RNA abundance, suggesting that microarray normalization approaches are likely to be applicable to quantitative proteomics data

Read more

Summary

Introduction

Comparative proteomics is a powerful analytical method for learning about the responses of biological systems to changes in growth parameters. Despite the popularity of this approach, considerable issues remain with how experiments are designed, including the evaluation of the statistical significance of the proteomics data These issues include but are not limited to experimental design including the choice of biological and technical replicates [1, 2], sample pooling and preparation [2], peptide identification [3, 4] and quantitation [5], accounting for intersample variation via normalization within and between experiments (6 –9), the “missing data problem” (1, 10 –12), selecting a robust statistical analysis with often very few replicates, and the “multiple testing problem” [13]. We focus on four key areas in metabolically labeled quantitative proteomics: normalization, experimental design and linear

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call