Integration of mass spectrometry and RNA‐Seq data to confirm human ab initio predicted genes and lncRNAs

Han Sun,Pengyuan Yang,Daixi Li,Lu Xie,Yixue Li,Mingwei Liu,Meng Shi,Chen Chen,Dandan Wang

doi:10.1002/pmic.201400174

Abstract

MS/MS has been used to improve genome annotation in various organisms. The classical approach is to construct comprehensive theoretical peptide database with six frame translation model from the whole ORF of a genome and search against this database with real MS/MS spectra. In this work we took a more focused approach, we constructed a database containing only peptides from the ab initio predicted genes from current human genome annotation, and all theoretical peptides from currently annotated lncRNAs, and searched such a database with MS/MS data from human Hela cell line. The purpose of this design is to find translation evidence for ab initio predicted genes and to rule out possible wrongly defined lncRNAs in a systematic proteogenomics effort. To validate proteogenomics results, we integrated RNA-Seq data analysis for the same Hela cell line which generated MS/MS data, and performed MRM experiment on self-cultured Hela cell line samples. Six peptides were found to support ab initio predicted genes with both RNA-Seq and MRM validations, while none was found to support a translated lncRNA. This workflow could be flexibly applied to other human samples and datasets to help further improve human gene annotation.

Full Text