Abstract
An increasing number of studies involve integrative analysis of gene and protein expression data, taking advantage of new technologies such as next-generation transcriptome sequencing and highly sensitive mass spectrometry (MS) instrumentation. Recently, a strategy, termed ribosome profiling (or RIBO-seq), based on deep sequencing of ribosome-protected mRNA fragments, indirectly monitoring protein synthesis, has been described. We devised a proteogenomic approach constructing a custom protein sequence search space, built from both Swiss-Prot- and RIBO-seq-derived translation products, applicable for MS/MS spectrum identification. To record the impact of using the constructed deep proteome database, we performed two alternative MS-based proteomic strategies as follows: (i) a regular shotgun proteomic and (ii) an N-terminal combined fractional diagonal chromatography (COFRADIC) approach. Although the former technique gives an overall assessment on the protein and peptide level, the latter technique, specifically enabling the isolation of N-terminal peptides, is very appropriate in validating the RIBO-seq-derived (alternative) translation initiation site profile. We demonstrate that this proteogenomic approach increases the overall protein identification rate 2.5% (e.g. new protein products, new protein splice variants, single nucleotide polymorphism variant proteins, and N-terminally extended forms of known proteins) as compared with only searching UniProtKB-SwissProt. Furthermore, using this custom database, identification of N-terminal COFRADIC data resulted in detection of 16 alternative start sites giving rise to N-terminally extended protein variants besides the identification of four translated upstream ORFs. Notably, the characterization of these new translation products revealed the use of multiple near-cognate (non-AUG) start codons. As deep sequencing techniques are becoming more standard, less expensive, and widespread, we anticipate that mRNA sequencing and especially custom-tailored RIBO-seq will become indispensable in the MS-based protein or peptide identification process. The underlying mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the dataset identifier PXD000124.
Highlights
From the ‡Laboratory of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, B-9000, Ghent, the Stem Cell Institute Leuven, Department of Development and Regeneration, Catholic University, Leuven, B-3000 Leuven, the ‡‡Department of Medical Protein Research, Flemish Institute for Biotechnology, B-9000 Ghent, and the §§Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
To record the impact of using the constructed deep proteome database, we performed two types of proteome analysis as follows: (i) a regular shotgun proteomic and (ii) an N-terminal COFRADIC approach. The former gives an overall assessment on the protein and peptide level, the latter, by enriching for N-terminal peptides, is highly suited for validating the RIBO-seq translation initiation site observations [15]
Shotgun Proteomics—Using the custom combined database as search space, the number of protein identifications increases with 2.64% as compared with searching the UniProtKB-SwissProt reference set only
Summary
After MS/MS spectra acquisition, protein sequence database searching (Mascot [3], X!Tandem [4], and OMSSA [5], among others) is used for peptide identification. Integration of RIBO-seq Information in MS-based Proteomics the real protein pool of a specific sample or even be allinclusive. A new strategy, termed ribosome profiling (or RIBO-seq), based on deep sequencing of ribosome-protected mRNA fragments, monitoring protein synthesis, has been described [13, 14]. Ribosome profiling is more suitable than mRNA-seq to delineate the exact ORFs and derive protein sequences, which are highly informative, to create a custom sequence search space for MS/MS-based peptide identification. For more than 65% of the annotated proteins, more than one translation initiation site was determined [15]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.