Incorrect Peptide Identifications Research Articles

Proteolytic digestion of proteins by one or more proteases is a key step in shotgun proteomics, in which the proteolytic products, i.e., peptides, are taken as the surrogates of their parent proteins for further qualitative or quantitative analysis. The proteases generally cleave proteins at specific amino acid residue sites, but digestion is hardly complete (wide existence of missed cleavage sites). Therefore, it would be of great help to improve the prior experimental design and the posterior data analysis if the digestion behaviors of proteases can be accurately modeled and predicted. At present, systematic studies about the commonly used proteases in proteomics are insufficient, and there is a lack of easy-to-use tools to predict the cleavage sites of different proteases. Here, we propose a novel sequence-based deep learning algorithm-DeepDigest, which integrates convolutional neural networks and long short-term memory networks for protein digestion prediction. DeepDigest can predict the cleavage probability of each potential cleavage site on the protein sequences for eight popular proteases including trypsin, ArgC, chymotrypsin, GluC, LysC, AspN, LysN, and LysargiNase. We compared DeepDigest with three traditional machine learning algorithms, i.e., logistic regression, random forest, and support vector machine. On the eight training data sets, the 10-fold cross-validation accuracies (AUCs) of DeepDigest were 0.956-0.982, significantly higher than those of the three traditional algorithms. On the 11 independent test data sets, DeepDigest achieved AUCs between 0.849 and 0.978, outperforming the other traditional algorithms in most cases. Transfer learning then further improved the prediction accuracy. Besides, some interesting characteristics of different proteases were revealed and discussed. Ultimately, as an application, we used DeepDigest to predict the digestibilities of peptides and demonstrated that peptide digestibility is an informative new feature to discriminate between correct and incorrect peptide identifications.

Read full abstract

Reliable quantification of low-abundance proteins in complex proteomes is challenging largely owing to the limited number of spectra/peptides identified. In this study we developed a straightforward method to improve the quantitative accuracy and precision of proteins by strategically retrieving the less confident peptides that were previously filtered out using the standard target-decoy search strategy. The filtered-out MS/MS spectra matched to confidently-identified proteins were recovered, and the peptide-spectrum-match FDR were re-calculated and controlled at a confident level of FDR≤1%, while protein FDR maintained at ~1%. We evaluated the performance of this strategy in both spectral count- and ion current-based methods. >60% increase of total quantified spectra/peptides was respectively achieved for analyzing a spike-in sample set and a public dataset from CPTAC. Incorporating the peptide retrieval strategy significantly improved the quantitative accuracy and precision, especially for low-abundance proteins (e.g. one-hit proteins). Moreover, the capacity of confidently discovering significantly-altered proteins was also enhanced substantially, as demonstrated with two spike-in datasets. In summary, improved quantitative performance was achieved by this peptide recovery strategy without compromising confidence of protein identification, which can be readily implemented in a broad range of quantitative proteomics techniques including label-free or labeling approaches. SignificanceWe hypothesize that more quantifiable spectra and peptides in a protein, even including less confident peptides, could help reduce variations and improve protein quantification. Hence the peptide retrieval strategy was developed and evaluated in two spike-in sample sets with different LC-MS/MS variations using both MS1- and MS2-based quantitative approach. The list of confidently identified proteins using the standard target-decoy search strategy was fixed and more spectra/peptides with less confidence matched to confident proteins were retrieved. However, the total peptide-spectrum-match false discovery rate (PSM FDR) after retrieval analysis was still controlled at a confident level of FDR≤1%. As expected, the penalty for occasionally incorporating incorrect peptide identifications is negligible by comparison with the improvements in quantitative performance. More quantifiable peptides, lower missing value rate, better quantitative accuracy and precision were significantly achieved for the same protein identifications by this simple strategy. This strategy is theoretically applicable for any quantitative approaches in proteomics and thereby provides more quantitative information, especially on low-abundance proteins.

Read full abstract

Incorrect Peptide Identifications Research Articles

Related Topics

Articles published on Incorrect Peptide Identifications

DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning.

A peptide-retrieval strategy enables significant improvement of quantitative performance without compromising confidence of identification

Search Databases and Statistics: Pitfalls and Best Practices in Phosphoproteomics.

Analytical performance of reciprocal isotope labeling of proteome digests for quantitative proteomics and its application for comparative studies of aerobic and anaerobic Escherichia coli proteomes

Improvements to the Percolator Algorithm for Peptide Identification from Shotgun Proteomics Data Sets

Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines

Improved Sequence Tag Generation Method for Peptide Identification in Tandem Mass Spectrometry

Oscore: a combined score to reduce false negative rates for peptide identification in tandem mass spectrometry analysis

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry

An evaluation for cross‐species proteomics research by publicly available expressed sequence tag database search using tandem mass spectral data

Integrated Approach for Manual Evaluation of Peptides Identified by Searching Protein Sequence Databases with Tandem Mass Spectra

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Incorrect Peptide Identifications Research Articles

Related Topics

Articles published on Incorrect Peptide Identifications

DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning.

A peptide-retrieval strategy enables significant improvement of quantitative performance without compromising confidence of identification

Search Databases and Statistics: Pitfalls and Best Practices in Phosphoproteomics.

Analytical performance of reciprocal isotope labeling of proteome digests for quantitative proteomics and its application for comparative studies of aerobic and anaerobic Escherichia coli proteomes

Improvements to the Percolator Algorithm for Peptide Identification from Shotgun Proteomics Data Sets

Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines

Improved Sequence Tag Generation Method for Peptide Identification in Tandem Mass Spectrometry

Oscore: a combined score to reduce false negative rates for peptide identification in tandem mass spectrometry analysis

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry

An evaluation for cross‐species proteomics research by publicly available expressed sequence tag database search using tandem mass spectral data

Integrated Approach for Manual Evaluation of Peptides Identified by Searching Protein Sequence Databases with Tandem Mass Spectra