Computationally aided protein identification of mass spectrometry (MS) data has been a challenge for the proteomic community since the field's inception. As the community attempts to address challenges such as computational site assignment of posttranslational modifications (PTMs), selective reaction monitoring data analysis, and the transition of proteomic assays to the clinic a robust framework with defined metrics is essential. The frameworks used for protein identification are currently score based, not hypothesis driven nor deterministic, resulting in a gradation from poor to good but without the ability to recognize when a given tandem MS spectrum has insufficient information in the context of the proteome to be identified. Means of computing deterministic and pseudodeterministic assays have been proposed by both us and others [Sherman, J., McKay, M.J., Ashman, K., Molloy, M.P., Unique ion signature mass spectrometry, a deterministic method to assign peptide identity. Mol. Cell. Proteomics 2009, 8, 2051-2062; Kiyonami, R., Schoen, A., Prakash, A., Peterman, S., et al., Increased selectivity, analytical precision, and through-put in targeted proteomics. Mol. Cell. Proteomics 2011, 10, M110.002931]. We believe it is crucial to incorporate the complexity of the proteome into the design of the experiment/data analysis upstream and that by doing so proteomic data analysis will become more robust. The seminal paper by Claude Shannon published in 1948 provides the robust mathematical framework now called Information Theory, to achieve this goal. Information Theory defines appropriate metrics for information and complexity as well as means to account for interference, all of which is directly applicable to peptide identification. We attempt to encourage the adoption of Information Theory as a means to ensure that appropriate metrics are used to measure the uncertainty in a peptide identification with or without PTM site assignment.
Read full abstract