The changes in protein expression are hallmarks of development and disease. Protein expression can be established qualitatively and quantitatively using mass spectrometry (MS). Samples are prepared, proteins extracted and then analyzed using MS and MS/MS. The resulting spectra need to be processed computationally to assign peptide spectrum match. Database searches employ sequence databases or spectral libraries for matching possible peptides with the measured spectra. This route is well established but fails when peptides are not found in sequence repositories. In this case, de novo sequencing of MS/MS spectra can be employed. Many computational algorithms that establish the peptide sequence from MS/MS spectrum alone are available. While de novo sequencing assigns a sequence to an MS/MS spectrum, this assignment can be used in further processes for genome annotation. For example, novel exons can be assigned, known exons can be extended, and splice sites can be validated at the protein level. We compiled an extensive list of such algorithms, grouped them, and discussed the selected approaches. We also provide a roadmap of how de novo sequencing can enter mainstream proteogenomic analysis. In the future, de novo predictions can be added to sample-specific protein databases, including RNA-seq translations. These enriched databases can then be used for proteogenomics studies with existing pipelines.
Read full abstract