Abstract

If sequencing was possible only for genomes, and not for RNAs or proteins, then functional protein-coding exons would be recognizable by their unusual patterns of nucleotide composition, specifically a high GC content across the body of exons, and an unusual nucleotide content near their edges. RNAs and proteins can, of course, be sequenced but the extent of functionality of intergenic long noncoding RNAs (lncRNAs) remains under question owing to their low nucleotide conservation. Inspired by the nucleotide composition patterns of protein-coding exons, we sought evidence for functionality across lncRNA loci from diverse species. We found that such patterns across multiexonic lncRNA loci mirror those of protein-coding genes, although to a lesser degree: Specifically, compared with introns, lncRNA exons are GC rich. Additionally we report evidence for the action of purifying selection to preserve exonic splicing enhancers within human multiexonic lncRNAs and nucleotide composition in fruit fly lncRNAs. Our findings provide evidence for selection for more efficient rates of transcription and splicing within lncRNA loci. Despite only a minor proportion of their RNA bases being constrained, multiexonic intergenic lncRNAs appear to require accurate splicing of their exons to transact their function.

Highlights

  • Nucleotide composition has long been known to vary greatly among long genomic regions (Eyre-Walker and Hurst 2001; Duret and Galtier 2009)

  • Human multiexonic long (≥200 nt) noncoding RNAs (lncRNAs) loci are very modestly constrained in their exons, relative to their introns, which we have interpreted as implying either that their functions contribute little to organismal fitness or that their functionality is conveyed by only a small minority of their sequences (Ponjavic and Ponting 2007; Haerty and Ponting 2013)

  • The latter possibility is supported by our observations that, akin to protein-coding genes, (i) evolutionary constraint is more concentrated near to human spliced lncRNA intron–exon boundaries, (ii) such regions contain an unusually high density of exonic splicing enhancers (ESEs), and (iii) these ESEs are unexpectedly preserved in orthologous sequence in sequence

Read more

Summary

Introduction

Nucleotide composition has long been known to vary greatly among long genomic regions (Eyre-Walker and Hurst 2001; Duret and Galtier 2009). Nucleotide compositional variation across exons has been associated with short motifs proximal to exon–intron boundaries that either enhance or inhibit splicing (Mount et al 1992; Fairbrother et al 2002; Wang et al 2004) While these features are well known for protein-coding sequences, as are their molecular functions, much remains to be learned for the thousands of intergenic long (≥200 nt) noncoding RNAs (lncRNAs) that have been predicted to be transcribed from animal genomes (Ulitsky and Bartel 2013). LncRNA loci are found in diverse genomic contexts (enhancer, promoter-associated, intergenic, intronic, antisense; for review, see Qureshi and Mehler 2012) and their transcripts, either polyadenylated or nonpolyadenylated, can be located in diverse cellular compartments (Derrien et al 2012; van Heesch et al 2014) They vary widely in size, ranging from as few as 200 nt to >8 kb for loci such as Malat. In addition lncRNA loci can either be composed of a single exon (for example, Malat, Paupar) or of multiple exons (Hotair and Xist)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.