Abstract

Long noncoding RNAs (lncRNAs) are intracellular transcripts longer than 200 nucleotides and lack protein-coding information. A subclass of lncRNA known as long intergenic noncoding RNAs (lincRNAs) are transcribed from genomic regions that share no overlap with annotated protein-coding genes. Increasing evidence has shown that some annotated lincRNA transcripts do in fact contain open reading frames (ORFs) encoding functional short peptides in the cell. Few robust methods for lincRNA-encoded peptide identification have been reported, and the tissue-specific expression of these peptides has been largely unexplored. Here we propose an integrative workflow for lincRNA-encoded peptide discovery and test it on the mouse kidney inner medulla (IM). In brief, low molecular weight protein fractions were enriched from homogenate of IMs and trypsinized into shorter peptides, which were sequenced by high resolution liquid chromatography-tandem mass spectrometry (LC-MS/MS). To curate a hypothetical lincRNA-encoded peptide database for peptide-spectrum matching following LC-MS/MS, we performed RNA-Seq on IMs, computationally removed reads overlapping with annotated protein-coding genes, and remapped the remaining reads to a database of mouse noncoding transcripts to infer lincRNA expression. Expressed lincRNAs were searched for ORFs by an existing rule-based algorithm, and translated ORFs were used for peptide-spectrum matching. Peptides identified by LC-MS/MS were further evaluated by using several quality control criteria and bioinformatics methods. We discovered three novel lincRNA-encoded peptides, which are conserved in mouse, rat, and human. The workflow can be adapted for discovery of small protein-coding genes in any species or tissue where noncoding transcriptome information is available.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call