Evolution of the unspliced transcriptome.

Jan Engelhardt,Peter F Stadler

doi:10.1186/s12862-015-0437-7

Abstract

BackgroundDespite their abundance, unspliced EST data have received little attention as a source of information on non-coding RNAs. Very little is know, therefore, about the genomic distribution of unspliced non-coding transcripts and their relationship with the much better studied regularly spliced products. In particular, their evolution has remained virtually unstudied.ResultsWe systematically study the evidence on unspliced transcripts available in EST annotation tracks for human and mouse, comprising 104,980 and 66,109 unspliced EST clusters, respectively. Roughly one third of these are located totally inside introns of known genes (TINs) and another third overlaps exonic regions (PINs). Eleven percent are “intergenic”, far away from any annotated gene. Direct evidence for the independent transcription of many PINs and TINs is obtained from CAGE tag and chromatin data. We predict more than 2000 3’UTR-associated RNA candidates for each human and mouse. Fifteen to twenty percent of the unspliced EST cluster are conserved between human and mouse. With the exception of TINs, the sequences of unspliced EST clusters evolve significantly slower than genomic background. Furthermore, like spliced lincRNAs, they show highly tissue-specific expression patterns.ConclusionsUnspliced long non-coding RNAs are an important, rapidly evolving, component of mammalian transcriptomes. Their analysis is complicated by their preferential association with complex transcribed loci that usually also harbor a plethora of spliced transcripts. Unspliced EST data, although typically disregarded in transcriptome analysis, can be used to gain insights into this rarely investigated transcriptome component. The frequently postulated connection between lack of splicing and nuclear retention and the surprising overlap of chromatin-associated transcripts suggests that this class of transcripts might be involved in chromatin organization and possibly other mechanisms of epigenetic control.Electronic supplementary materialThe online version of this article (doi:10.1186/s12862-015-0437-7) contains supplementary material, which is available to authorized users.

Highlights

Despite their abundance, unspliced EST data have received little attention as a source of information on non-coding RNAs
A possible reason for this strong association with known pre-existing annotation could be that unspliced EST (uEST) cluster are just by-products of “normal” spliced transcripts arising from occasionally inefficient splicing or a background of not yet processed primary transcripts
The correlation of the amount of spliced and unspliced ESTs is indicative of the overall coupling between spliced and unspliced expression

Summary

Introduction

Despite their abundance, unspliced EST data have received little attention as a source of information on non-coding RNAs. Very little is know, about the genomic distribution of unspliced non-coding transcripts and their relationship with the much better studied regularly spliced products. About the genomic distribution of unspliced non-coding transcripts and their relationship with the much better studied regularly spliced products Nuclear retained ncRNAs are often spliced transcripts but not polyadenylated. These “dark matter RNAs”, which have remained largely un-annotated so far, can be the dominating non-ribosomal RNA component in a mammalian cell [6, 7]. The overwhelming majority of lncRNAs for which detailed functional information is available is spliced, see e.g. [18], splicing often tends to occur only after transcription and is less efficient [19]

Methods

Results

Conclusion