Abstract
Long non-coding RNAs (lncRNAs) constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome. Such characterization requires a comprehensive, high-quality annotation of their gene structure and boundaries, which is currently lacking. Here we describe RACE-Seq, an experimental workflow designed to address this based on RACE (rapid amplification of cDNA ends) and long-read RNA sequencing. We apply RACE-Seq to 398 human lncRNA genes in seven tissues, leading to the discovery of 2,556 on-target, novel transcripts. About 60% of the targeted loci are extended in either 5′ or 3′, often reaching genomic hallmarks of gene boundaries. Analysis of the novel transcripts suggests that lncRNAs are as long, have as many exons and undergo as much alternative splicing as protein-coding genes, contrary to current assumptions. Overall, we show that RACE-Seq is an effective tool to annotate an organism's deep transcriptome, and compares favourably to other targeted sequencing techniques.
Highlights
Long non-coding RNAs constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome
We extended 176 Long non-coding RNAs (lncRNAs) loci at the 50 end and 193 loci at the 30 end out of the total of 398 loci targeted for extension from GENCODE v7 (Fig. 2a)
Our results suggest that these are artifacts arising from inaccurate annotation of lncRNA transcript structures, since the biases towards both two-exon transcripts and isoformpoor genes disappear in the post-RACE-Seq transcripts (Fig. 4b,c)
Summary
Long non-coding RNAs (lncRNAs) constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome Such characterization requires a comprehensive, high-quality annotation of their gene structure and boundaries, which is currently lacking. A highthroughput sequencing method called CaptureSeq was used for lncRNA characterization, in conjunction with Illumina short-read sequencing It achieves targeted transcript enrichment by the hybridization of cDNA (derived from cellular RNA) to beadlinked oligonucleotide probes that are tiled and complementary to exons[19,20]. RNA CaptureSeq proved to be effective for the discovery of novel lowly expressed transcripts and allows for their quantification and assembly This procedure has not been designed to address the proper definition of 50 and 30 transcript ends, and as a result other methods are required for the precise experimental annotation of gene boundaries. We here apply RACE-Seq to a selection of 398 lncRNA loci from the reference GENCODE v7
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have