Abstract

Tandem mass spectrometry is an indispensable technology for identification of proteins from complex mixtures. Accurate and sensitive analysis of large amounts of mass spectra data is a principal challenge in proteomics. Conventional deep learning-based peptide identification models usually adopt an encoder-decoder framework and generate target sequence from left to right without fully exploiting the global information. A few recent approaches seek to employ two-pass decoding, yet have limitations when facing the spectra filled with noise. In this paper, we propose a new paradigm for improved peptide identification, which first retrieves a similar mass spectrum from the database as a reference and then revise the matched sequence according to the difference information between the referenced spectrum and current context. The inspiration of design comes that the retrieved peptide-spectrum pair provides a good start point and indirect access to both past and future information, such that each revised amino acid can be produced with better noise perception and global understanding. Moreover, a disturb-based optimization process is introduced to sharpen the attention for difference vector with reinforcement learning before fed to decoder. Experimental results on several public datasets demonstrate that prominent performance boost is obtained with the proposed method. Remarkably, we achieve new state-of-the-art identification results on these datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.