Abstract
MS/MS has been used to improve genome annotation in various organisms. The classical approach is to construct comprehensive theoretical peptide database with six frame translation model from the whole ORF of a genome and search against this database with real MS/MS spectra. In this work we took a more focused approach, we constructed a database containing only peptides from the ab initio predicted genes from current human genome annotation, and all theoretical peptides from currently annotated lncRNAs, and searched such a database with MS/MS data from human Hela cell line. The purpose of this design is to find translation evidence for ab initio predicted genes and to rule out possible wrongly defined lncRNAs in a systematic proteogenomics effort. To validate proteogenomics results, we integrated RNA-Seq data analysis for the same Hela cell line which generated MS/MS data, and performed MRM experiment on self-cultured Hela cell line samples. Six peptides were found to support ab initio predicted genes with both RNA-Seq and MRM validations, while none was found to support a translated lncRNA. This workflow could be flexibly applied to other human samples and datasets to help further improve human gene annotation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have