Charging of analytes is a prerequisite for performing mass spectrometry analysis. In proteomics, electrospray ionization is the dominant technique for this process. Although the observation of differences in the peptide charge state distribution (CSD) is well-known among experimentalists, its analytical value remains underexplored. To investigate the utility of this dimension, we analyzed several public data sets, comprising over 250,000 peptide CSD profiles from the human proteome. We found that the dimensions of the CSD demonstrate high reproducibility across multiple laboratories, mass analyzers, and extensive time intervals. The general observation was that the CSD enabled effective partitioning of the peptide property space, resulting in enhanced discrimination between sequence and constitutional peptide isomers. Next, by evaluating the CSD values of phosphorylated peptides, we were able to differentiate between phosphopeptides that indicate the formation of intramolecular structures in the gas phase and those that do not. The reproducibility of the CSD values (mean cosine similarity above 0.97 for most of the experiments) qualified CSD data suitable to train a deep-learning model capable of accurately predicting CSD values (mean cosine similarity - 0.98). When we applied the CSD dimension to MS1- and MS2-based proteomics experiments, we consistently observed around a 5% increase in protein and peptide identification rate. Even though the CSD dimension is not as effective a discriminator as the widely used retention time dimension, it still holds the potential for application in direct infusion proteomics.
Read full abstract