Peptide-level quantification using mass spectrometry (MS) is no trivial task as the physicochemical properties affect both response and detectability. The specific amino acid (AA) sequence affects these properties, however the connection between sequence and intensity output remains poorly understood. In this work, we explore combinations of amino acid pairs (i.e., dimer motifs) to determine a potential relationship between the local amino acid environment and MS1 intensity. For this purpose, a deep learning (DL) model, consisting of an encoder-decoder with an attention mechanism, was built. The attention mechanism allowed to identify the most relevant motifs. Specific patterns were consistently observed where a bulky/aromatic and hydrophobic AA followed by a cationic AA as well as consecutive bulky/aromatic and hydrophobic AAs were found important for the prediction of the MS1 intensity. Correlating attention weights to mean MS1 intensities revealed that some important motifs, particularly containing Trp, His, and Cys, were linked with low responding peptides whereas motifs containing Lys and most bulky hydrophobic AAs were often associated with high responding peptides. Moreover, Asn-Gly was associated with low response. The model predicts MS1 response with a mean average percentage error of ∼11 % and a Pearson correlation coefficient of ∼0.64. While dimer representation of peptide sequences did not improve predictive capacity compared to single AA representation in earlier work, this work adds valuable insight for a better understanding of peptide response in MS analysis. SignificanceMass spectrometry is not inherently quantitative, and the response of a compound relies not only on its concentration but also on the molecular composition. For mass spectrometry-based analysis of peptides, such as in bottom-up proteomics, this directly implies that the response cannot be used directly to quantify individual peptides. Moreover, the dependency of the response on the amino acid sequence of individual peptides remains poorly understood. Using a deep learning model based on a recurrent neural network with an attention mechanism, we here investigate how the presence of dimer motifs within a peptide affects the MS1 response through the analysis of intended equimolar peptide pools comprising almost 200,000 unique peptides in total. Not only do we identify certain dimer classes and specific dimers that substantially affect the MS1 response, but the model is also able to predict peptide intensity with low error rates within the independent test subset. The findings not only improve our understanding of the link between sequence and response for peptides but also highlight the potential of utilizing deep learning for developing methods allowing for absolute, label-free peptide quantification.
Read full abstract