Abstract

BackgroundDatabase search has been the main approach for proteoform identification by top-down tandem mass spectrometry. However, when the target proteoform that produced the spectrum contains post-translational modifications (PTMs) and/or mutations, it is quite time consuming to align a query spectrum against all protein sequences without any PTMs and mutations in a large database. Consequently, it is essential to develop efficient and sensitive filtering algorithms for speeding up database search.ResultsIn this paper, we propose a spectrum graph matching (SGM) based protein sequence filtering method for top-down mass spectral identification. It uses the subspectra of a query spectrum to generate spectrum graphs and searches them against a protein database to report the best candidates. As the sequence tag and gaped tag approaches need the preprocessing step to extract and select tags, the SGM filtering method circumvents this preprocessing step, thus simplifying data processing. We evaluated the filtration efficiency of the SGM filtering method with various parameter settings on an Escherichia coli top-down mass spectrometry data set and compared the performances of the SGM filtering method and two tag-based filtering methods on a data set of MCF-7 cells.ConclusionsExperimental results on the data sets show that the SGM filtering method achieves high sensitivity in protein sequence filtration. When coupled with a spectral alignment algorithm, the SGM filtering method significantly increases the number of identified proteoform spectrum-matches compared with the tag-based methods in top-down mass spectrometry data analysis.

Highlights

  • Database search has been the main approach for proteoform identification by top-down tandem mass spectrometry

  • Data sets The spectrum graph matching (SGM) filtering algorithm was evaluated on two data sets

  • The first is a top-down mass spectrometry (MS) data set with 2027 collision-induced dissociation (CID) and 2027 electron-transfer dissociation (ETD) MS/MS

Read more

Summary

Introduction

Database search has been the main approach for proteoform identification by top-down tandem mass spectrometry. When the target proteoform that produced the spectrum contains post-translational modifications (PTMs) and/or mutations, it is quite time consuming to align a query spectrum against all protein sequences without any PTMs and mutations in a large database. Top-down mass spectrometry (MS) is an important technology for identifying proteoforms with primary sequence alterations, such as post-translational modifications (PTMs) and mutations [1], because it provides “a bird’s eye” view of whole proteoforms. Reliable identification of protein alterations plays an important role in understanding biological mechanisms underlying diseases [2]. Represented by tools such as ProsightPC [3, 4] and TopPIC [5], database search [6, 7] is the dominant

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call