Intergenic RNA mainly derives from nascent transcripts of known genes

Federico Agostini,Jernej Ule,Jan Attig,Julian Zagalak,Nicholas M Luscombe

doi:10.1186/s13059-021-02350-x

Abstract

BackgroundEukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs. Transcription is not restricted to regions with annotated gene features but includes almost any genomic context. Currently, the source and function of most RNAs originating from intergenic regions in the human genome remain unclear.ResultsWe hypothesize that many intergenic RNAs can be ascribed to the presence of as-yet unannotated genes or the “fuzzy” transcription of known genes that extends beyond the annotated boundaries. To elucidate the contributions of these two sources, we assemble a dataset of more than 2.5 billion publicly available RNA-seq reads across 5 human cell lines and multiple cellular compartments to annotate transcriptional units in the human genome. About 80% of transcripts from unannotated intergenic regions can be attributed to the fuzzy transcription of existing genes; the remaining transcripts originate mainly from putative long non-coding RNA loci that are rarely spliced. We validate the transcriptional activity of these intergenic RNAs using independent measurements, including transcriptional start sites, chromatin signatures, and genomic occupancies of RNA polymerase II in various phosphorylation states. We also analyze the nuclear localization and sensitivities of intergenic transcripts to nucleases to illustrate that they tend to be rapidly degraded either on-chromatin by XRN2 or off-chromatin by the exosome.ConclusionsWe provide a curated atlas of intergenic RNAs that distinguishes between alternative processing of well-annotated genes from independent transcriptional units based on the combined analysis of chromatin signatures, nuclear RNA localization, and degradation pathways.

Highlights

Eukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs
We find that most intergenic RNA is generated during transcription associated with annotated genes and is confined to chromatin due to efficient degradation of downstream of gene transcripts (DoGs) and linker of genes (LoGs) by XRN2, and upstream of gene transcripts (UoGs) by the exosome
Identification of intergenic transcriptional units To gain a comprehensive overview of the transcriptional landscape, we identified 38 publicly available datasets containing chromatin and nuclear fractionated RNA-seq samples

Summary

Introduction

Eukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs. Studies estimate that up to 85% of the human genome is pervasively transcribed by RNA polymerase II (Pol II), resulting in a plethora of RNA products [1,2,3,4] Many of these transcripts belong to well-established categories, such as messenger RNAs (mRNAs) which are characterized by the presence of 5′ cap, coding sequence (CDS), and poly(A) tail. In the past decade, efforts towards the identification and characterization of novel lncRNA genes have been made, either through computational predictions or functional assays [10, 11] Despite such endeavors, a marked proportion of RNA-seq reads from human cells still maps to unannotated, ostensibly intergenic portions of the human genome [12]. It is often challenging to understand whether such reads originate from independent transcription units or are associated with annotated genes

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome Biology	Publication Date: May 5, 2021
Citations: 22	License type: open-access

R Discovery Prime

R Discovery Prime

Intergenic RNA mainly derives from nascent transcripts of known genes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology

Lead the way for us

Similar Papers

Pervasive Transcription of the Human Genome Produces Thousands of Previously Unidentified Long Intergenic Noncoding RNAs
Matthew J Hangauer ... Ian W Vaughn
PLoS Genetics | VOL. 9
Matthew J Hangauer, et. al.Matthew J Hangauer ... Ian W Vaughn
20 Jun 2013
PLoS Genetics | VOL. 9

GENCODE: The reference human genome annotation for The ENCODE Project
Jennifer Harrow ...
Genome Research | VOL. 22
Jennifer Harrow, et. al.Jennifer Harrow ...
01 Sep 2012
Genome Research | VOL. 22

The Human Mitochondrial Transcriptome
Tim R Mercer ... John S Mattick
Cell | VOL. 146
Tim R Mercer, et. al.Tim R Mercer ... John S Mattick
01 Aug 2011
Cell | VOL. 146

Dissection of c-AMP Response Element Architecture by Using Genomic and Episomal Massively Parallel Reporter Assays.
Jessica E Davis ... Sriram Kosuri
Cell Systems | VOL. 11
Jessica E Davis, et. al.Jessica E Davis ... Sriram Kosuri
29 Jun 2020
Cell Systems | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Intergenic RNA mainly derives from nascent transcripts of known genes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology