PROTEIN IDENTIFICATION USING SEQUENCE DATABASES

A Ismailova,Ye Golenko,Ye Rais

doi:10.37943/aitu.2020.91.98.002

Abstract

The bottom-up proteomics approach (also known as the shotgun approach), based on the digestion of proteins in peptides and their sequencing using tandem mass spectrometry (MS/MS), has become widespread. The identification of peptides from the obtained MS/MS data is most often done using available sequence databases. In this paper, we present a detailed overview of the peptide identification workflow and description of the main protein bioinformatics databases. Choosing the correct search parameters and the sequence database is essential to the success of this method, and we pay special attention to the practical aspects of searching for efficient analysis of MS/MS spectra. We also consider possible reasons why database search tools cannot find the correct sequence for some MS/MS spectra, and highlight the issues of misidentification that can significantly reduce the value of published data. To help assess the assignment of peptides to MS/MS spectra, we will look at the scoring algorithms that are used in the most popular database search tools. We also analyze statistical methods and computational tools for validating peptide compliance with MS/MS data. The final part describes the process of determining the identity of protein samples from a list of peptide identifications and discusses the limitations of bottom-up proteomics

Highlights

The most widely used option for protein analysis is the bottom-up proteomics strategy [1-3]
If we are talking about a sequence of non-tryptic peptides, even if they are already present in the database, the number of possible variants provided by the search engine increases significantly, which complicates the procedure for reliable identification
The identification of peptides and proteins using database searches is the simplest and most common method for interpreting MS/MS data. This strategy is applicable only in the case of known proteins, the sequences of which are entered into databases

Summary

Introduction

The most widely used option for protein analysis is the bottom-up proteomics strategy [1-3]. After generating all possible structures of proteolytic peptides for each of them, the program calculates its mass, and the theoretical tandem mass spectrum, taking into account the type of ions typical for the type of fragmentation initiation specified by the user. If we are talking about a sequence of non-tryptic peptides, even if they are already present in the database, the number of possible variants provided by the search engine increases significantly, which complicates the procedure for reliable identification.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Journal of Astana IT University	Publication Date: Dec 25, 2020
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

PROTEIN IDENTIFICATION USING SEQUENCE DATABASES

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Journal of Astana IT University

Lead the way for us

Similar Papers

Interpretation of Shotgun Proteomic Data
Alexey I Nesvizhskii ... Ruedi Aebersold
Molecular & Cellular Proteomics | VOL. 4
Alexey I Nesvizhskii, et. al.Alexey I Nesvizhskii ... Ruedi Aebersold
11 Jul 2005
Molecular & Cellular Proteomics | VOL. 4

Peptide Identification by Database Search of Mixture Tandem Mass Spectra
Jian Wang ... Philip E Bourne
Molecular & Cellular Proteomics | VOL. 10
Jian Wang, et. al.Jian Wang ... Philip E Bourne
23 Aug 2011
Molecular & Cellular Proteomics | VOL. 10

Combining Results of Multiple Search Engines in Proteomics
David Shteynberg ... Eric W Deutsch
Molecular & Cellular Proteomics | VOL. 12
David Shteynberg, et. al.David Shteynberg ... Eric W Deutsch
01 Sep 2013
Molecular & Cellular Proteomics | VOL. 12

The application of new software tools to quantitative protein profiling via isotope-coded affinity tag (ICAT) and tandem mass spectrometry: II. Evaluation of tandem mass spectrometry methodologies for large-scale protein analysis, and the application of statistical tools for data analysis and interpretation.
Priska D Von Haller ... Jimmy Eng
Molecular & Cellular Proteomics | VOL. 2
Priska D Von Haller, et. al.Priska D Von Haller ... Jimmy Eng
25 Jun 2003
Molecular & Cellular Proteomics | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PROTEIN IDENTIFICATION USING SEQUENCE DATABASES

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Journal of Astana IT University