Abstract
De novo sequencing of tandem (MS/MS) mass spectra represents the only way to determine the sequence of proteins from organisms with unknown genomes, or the ones not directly inscribed in a genome—such as antibodies, or novel splice variants. Top-down mass spectrometry provides new opportunities for analyzing such proteins; however, retrieving a complete protein sequence from top-down MS/MS spectra still remains a distant goal. In this paper, we review the state-of-the-art on this subject, and enhance our previously developed Twister algorithm for de novo sequencing of peptides from top-down MS/MS spectra to derive longer sequence fragments of a target protein.
Highlights
De novo sequencing of peptides and proteins from tandem (MS/MS) mass spectrometry data is an important and challenging problem, which has been attracting the attention of specialists in the field for a few decades
The only method for de novo sequencing of proteins solely from top-down MS/MS data was the one by Horn et al [29] capitalizing on the complementarity of collisionally activated dissociation (CAD) and electron capture dissociation (ECD), which has never become publicly available as a software program
We have proposed a method for combining sequence fragments of proteins from the sample being analyzed into their longer subsequences containing gaps, for each of which, an accurate estimate is reported
Summary
De novo sequencing of peptides and proteins from tandem (MS/MS) mass spectrometry data is an important and challenging problem, which has been attracting the attention of specialists in the field for a few decades. Most of the effort has been invested in retrieving target peptide sequences from bottom-up MS/MS data, leading to several handy software tools such as PEAKS [1], PepNovo [2], pNovo [3], Lutefisk [4], Sherenga [5], Vonode [6], Novor [7], the ALPS system [8], and a special-purpose program UVnovo [9], as well as a few alternative strategies that benefit from multiple enzyme digest [10,11,12,13,14], or pairs [15,16,17,18,19] or triples [20] of spectra acquired using different fragmentation techniques Despite those achievements, database search is commonly considered as a substantially more reliable approach to protein identification, and remains the choice of preference if a database is available; the most widely-used tools to this end in the bottom-up and top-down case are Sequest [21]. The Twister approach [31,32], Proteomes 2017, 5, 6; doi:10.3390/proteomes5010006 www.mdpi.com/journal/proteomes
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.