DeepLC can predict retention times for peptides that carry as-yet unseen modifications.

Robbin Bouwmeester,Niels Hulstaert,Ralf Gabriels,Sven Degroeve,Lennart Martens

doi:10.1038/s41592-021-01301-5

Robbin Bouwmeester, Niels Hulstaert + Show 3 more

Open Access

https://doi.org/10.1038/s41592-021-01301-5

Copy DOI

Abstract

The inclusion of peptide retention time prediction promises to remove peptide identification ambiguity in complex liquid chromatography-mass spectrometry identification workflows. However, due to the way peptides are encoded in current prediction models, accurate retention times cannot be predicted for modified peptides. This is especially problematic for fledgling open searches, which will benefit from accurate retention time prediction for modified peptides to reduce identification ambiguity. We present DeepLC, a deep learning peptide retention time predictor using peptide encoding based on atomic composition that allows the retention time of (previously unseen) modified peptides to be predicted accurately. We show that DeepLC performs similarly to current state-of-the-art approaches for unmodified peptides and, more importantly, accurately predicts retention times for modifications not seen during training. Moreover, we show that DeepLC's ability to predict retention times for any modification enables potentially incorrect identifications to be flagged in an open search of a wide variety of proteome data.

Highlights

Liquid Chromatography (LC) plays a critical role in Mass Spectrometry (MS) analysis of bottom-up proteomics[1]
We first evaluate the performance of DeepLC on retention time prediction for unmodified peptides, in comparison with state-of-the-art tools
We rely on two distinct ways of evaluating DeepLC’s performance on these modified peptides: (i) evaluate DeepLC performance on unseen modifications, and (ii) a novel type of evaluation which leaves out unmodified amino acids, and has DeepLC treat these as modified glycines

Summary

Introduction

Liquid Chromatography (LC) plays a critical role in Mass Spectrometry (MS) analysis of bottom-up proteomics[1]. By separating peptides based on their physicochemical properties in the LC step, the complexity of the sample presented to the MS instrument is greatly reduced. This reduction means that there is less ionization competition, improved sensitivity for data dependent/independent analysis, and reduced chimericity in fragmentation spectra (MS2) 2,3. In addition to these benefits, the retention time measurement itself provides an additional dimension of information to interpret the signals generated by a peptide[4]. In order to fill this knowledge gap, researchers have used models to predict retention times for previously unobserved peptides[4]

Methods

Results

Conclusion