Abstract

BackgroundEukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs. Transcription is not restricted to regions with annotated gene features but includes almost any genomic context. Currently, the source and function of most RNAs originating from intergenic regions in the human genome remain unclear.ResultsWe hypothesize that many intergenic RNAs can be ascribed to the presence of as-yet unannotated genes or the “fuzzy” transcription of known genes that extends beyond the annotated boundaries. To elucidate the contributions of these two sources, we assemble a dataset of more than 2.5 billion publicly available RNA-seq reads across 5 human cell lines and multiple cellular compartments to annotate transcriptional units in the human genome. About 80% of transcripts from unannotated intergenic regions can be attributed to the fuzzy transcription of existing genes; the remaining transcripts originate mainly from putative long non-coding RNA loci that are rarely spliced. We validate the transcriptional activity of these intergenic RNAs using independent measurements, including transcriptional start sites, chromatin signatures, and genomic occupancies of RNA polymerase II in various phosphorylation states. We also analyze the nuclear localization and sensitivities of intergenic transcripts to nucleases to illustrate that they tend to be rapidly degraded either on-chromatin by XRN2 or off-chromatin by the exosome.ConclusionsWe provide a curated atlas of intergenic RNAs that distinguishes between alternative processing of well-annotated genes from independent transcriptional units based on the combined analysis of chromatin signatures, nuclear RNA localization, and degradation pathways.

Highlights

  • Eukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs

  • We find that most intergenic RNA is generated during transcription associated with annotated genes and is confined to chromatin due to efficient degradation of downstream of gene transcripts (DoGs) and linker of genes (LoGs) by XRN2, and upstream of gene transcripts (UoGs) by the exosome

  • Identification of intergenic transcriptional units To gain a comprehensive overview of the transcriptional landscape, we identified 38 publicly available datasets containing chromatin and nuclear fractionated RNA-seq samples

Read more

Summary

Introduction

Eukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs. Studies estimate that up to 85% of the human genome is pervasively transcribed by RNA polymerase II (Pol II), resulting in a plethora of RNA products [1,2,3,4] Many of these transcripts belong to well-established categories, such as messenger RNAs (mRNAs) which are characterized by the presence of 5′ cap, coding sequence (CDS), and poly(A) tail. In the past decade, efforts towards the identification and characterization of novel lncRNA genes have been made, either through computational predictions or functional assays [10, 11] Despite such endeavors, a marked proportion of RNA-seq reads from human cells still maps to unannotated, ostensibly intergenic portions of the human genome [12]. It is often challenging to understand whether such reads originate from independent transcription units or are associated with annotated genes

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call