Abstract

MS/MS combined with database search methods can identify the proteins present in complex mixtures. High throughput methods that infer probable peptide sequences from enzymatically digested protein samples create a challenge in how best to aggregate the evidence for candidate proteins. Typically the results of multiple technical and/or biological replicate experiments must be combined to maximize sensitivity. We present a statistical method for estimating probabilities of protein expression that integrates peptide sequence identifications from multiple search algorithms and replicate experimental runs. The method was applied to create a repository of 797 non-homologous zebrafish (Danio rerio) proteins, at an empirically validated false identification rate under 1%, as a resource for the development of targeted quantitative proteomics assays. We have implemented this statistical method as an analytic module that can be integrated with an existing suite of open-source proteomics software.

Highlights

  • MS/MS combined with database search methods can identify the proteins present in complex mixtures

  • In “shotgun” proteomics studies, a protein sample is digested with a proteolytic enzyme, and the resulting peptide mixture is separated by LC prior to conducting collision

  • Shotgun proteomics experiments, which identify proteolytic peptides, need statistical methods to infer the presence of the parent proteins

Read more

Summary

Sample Collection

Zebrafish embryos were bred and raised from natural matings of wild-type tail long fin fish as described previously [33] and staged according to existing protocols [34]. Trypsin (Promega, Madison, WI) digestion using a 1:20 enzyme:protein (w/w) ratio at 37 °C for 16 h was carried out in 0.2% (w/v) RapiGest SF (Waters), in 50 mM ammonium bicarbonate. Off-line fractionation, and LC-MS/MS were performed as described previously [25]. Forty fractions were collected for reversed phase separation on line to a Thermo Finnigan LTQ ion trap mass spectrometer (Thermo Electron, San Jose, CA) using ESI. Each fraction was injected onto a Vydac C18 (Everest, 150 ϫ 1 mm, 300 Å, 5 ␮m; Bodmann, Aston, PA) column with 0.1% formic acid, 0.01% trichloroacetic acid in water for 10 min before a gradient (30 ␮l/min) was run over 120 min from 3 to 70% mobile phase B (0.1% formic acid, 0.01% trichloroacetic acid in acetonitrile). Probability that peptide identification i from spectrum s is correct Overall probability that peptide identification i is correct Set of all proteins in the search database Set of true plus homologue protein matches Set of true protein matches Number of proteins in ⍀ Set of theoretical digest products for protein j Effective length of protein j Set of peptide identifications whose sequence matches protein j Number of peptides in Mj Weight of peptide identification i to protein j Total probability of peptide identifications matching protein j Abundance category for protein j Parameter set for abundance category a Number of proteins of abundance a Total length of proteins of abundance a Total number of peptide matches to proteins of abundance a Proportion of proteins of abundance a that are in H Total length of proteins of abundance a that are in H Number of correct peptide identifications to proteins of abundance a True if protein j ʦ H, false otherwise Probability that j ʦ H True if protein j ʦ T, false otherwise Probability that j ʦ T

Data Processing
Statistical Model
EBP Implementation of the Algorithm
Analyses of a Test Protein Mixture
RESULTS
Proteins Empirical error
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call