Abstract
BackgroundProteogenomics aims to utilize experimental proteome information for refinement of genome annotation. Since mass spectrometry-based shotgun proteomics approaches provide large-scale peptide sequencing data with high throughput, a data repository for shotgun proteogenomics would represent a valuable source of gene expression evidence at the translational level for genome re-annotation.DescriptionHere, we present OryzaPG-DB, a rice proteome database based on shotgun proteogenomics, which incorporates the genomic features of experimental shotgun proteomics data. This version of the database was created from the results of 27 nanoLC-MS/MS runs on a hybrid ion trap-orbitrap mass spectrometer, which offers high accuracy for analyzing tryptic digests from undifferentiated cultured rice cells. Peptides were identified by searching the product ion spectra against the protein, cDNA, transcript and genome databases from Michigan State University, and were mapped to the rice genome. Approximately 3200 genes were covered by these peptides and 40 of them contained novel genomic features. Users can search, download or navigate the database per chromosome, gene, protein, cDNA or transcript and download the updated annotations in standard GFF3 format, with visualization in PNG format. In addition, the database scheme of OryzaPG was designed to be generic and can be reused to host similar proteogenomic information for other species. OryzaPG is the first proteogenomics-based database of the rice proteome, providing peptide-based expression profiles, together with the corresponding genomic origin, including the annotation of novelty for each peptide.ConclusionsThe OryzaPG database was constructed and is freely available at http://oryzapg.iab.keio.ac.jp/.
Highlights
Proteogenomics aims to utilize experimental proteome information for refinement of genome annotation
We developed PGFeval (ProteoGenomic Features Evaluator), an evaluation and visualization tool using perl and the GD library http://www.libgd.org, which evaluates the genomic novelty of each peptide and draws the whole gene model with graphical annotation that incorporates the genomic novelty of the peptides
The protein, cDNA, and transcript information such as the IDs, aliases, descriptions, lengths and sequences were extracted from the FASTA files and the GFF3 files obtained from the Michigan State University (MSU) website and MSU genome browser, converted to tables using perl scripts
Summary
Among high-throughput experimental methods, genome sequencing represents a turning point in the understanding of biological systems. Mass spectrometry-based proteomics, as an experimental approach to measure proteins, can provide translation-level expression evidence for the predicted protein-coding genes; this is the so-called proteogenomics approach of using large-scale proteome data in genome annotation refinement [3,8,13,14] This approach seems the best option for identification and validation of protein-coding genes, or at least a significant portion of them, in an independent and unambiguous way. Unlike the currently available rice proteome database [27], which provides the 2D-PAGE-based proteome, OryzaPG-DB contains peptides obtained from shotgun-based proteomics with their product ion spectra, as well as updated annotations, side by side with the corresponding protein, cDNA, transcript and genomic sequences and information. Proteogenomics analysis to find novel genomic features we performed proteogenomic data analysis using bioinformatics approaches to map the identified peptides
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have