EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications

Anshupa Sahu,Ho-Ryun Chung,Na Li,Ilona Dunkel

doi:10.1186/s13072-020-00341-z

Anshupa Sahu, Ho-Ryun Chung + Show 2 more

Open Access

PDF Available

https://doi.org/10.1186/s13072-020-00341-z

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundUnderstanding the transcriptome is critical for explaining the functional as well as regulatory roles of genomic regions. Current methods for the identification of transcription units (TUs) use RNA-seq that, however, require large quantities of mRNA rendering the identification of inherently unstable TUs, e.g. miRNA precursors, difficult. This problem can be alleviated by chromatin-based approaches due to a correlation between histone modifications and transcription.ResultsHere, we introduce EPIGENE, a novel chromatin segmentation method for the identification of active TUs using transcription-associated histone modifications. Unlike the existing chromatin segmentation approaches, EPIGENE uses a constrained, semi-supervised multivariate hidden Markov model (HMM) that models the observed combination of histone modifications using a product of independent Bernoulli random variables, to identify active TUs. Our results show that EPIGENE can identify genome-wide TUs in an unbiased manner. EPIGENE-predicted TUs show an enrichment of RNA Polymerase II at the transcription start site and in gene body indicating that they are indeed transcribed. Comprehensive validation using existing annotations revealed that 93% of EPIGENE TUs can be explained by existing gene annotations and 5% of EPIGENE TUs in HepG2 can be explained by microRNA annotations. EPIGENE outperformed the existing RNA-seq-based approaches in TU prediction precision across human cell lines. Finally, we identified 232 novel TUs in K562 and 43 novel cell-specific TUs all of which were supported by RNA Polymerase II ChIP-seq and Nascent RNA-seq data.ConclusionWe demonstrate the applicability of EPIGENE to identify genome-wide active TUs and to provide valuable information about unannotated TUs. EPIGENE is an open-source method and is freely available at: https://github.com/imbbLab/EPIGENE.

Highlights

Understanding the transcriptome is critical for explaining the functional as well as regulatory roles of genomic regions
We developed a semisupervised hidden Markov model (HMM), EPIGENE (EPIgenomic GENE), which is trained on the combinatorial pattern of IHEC class 1 epigenomes (H3K27ac, H3K4me1, H3K4me3, H3K36me3, H3K27me3, and H3K9me3) to infer hidden “transcription unit states”
Schematic overview of EPIGENE EPIGENE uses a multivariate HMM, which allows the probabilistic modelling of the combinatorial presence and absence of multiple IHEC class 1 histone modifications

Summary

Introduction

Understanding the transcriptome is critical for explaining the functional as well as regulatory roles of genomic regions. Current methods for the identification of transcription units (TUs) use RNA-seq that, require large quantities of mRNA rendering the identification of inherently unstable TUs, e.g. miRNA precursors, dif‐ ficult. This problem can be alleviated by chromatin-based approaches due to a correlation between histone modifica‐ tions and transcription. Transcription units (TUs) represent the transcribed regions of the genome which generate protein-coding genes as well as regulatory non-coding RNAs like microRNAs. Accurate identification of TUs is important to. This is problematic for accurate identification of inherently unstable TUs like primary miRNA, etc. These shortcomings of existing approaches can be alleviated with chromatinbased approaches [21, 22], due to the association between histone modifications and transcription

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Epigenetics & Chromatin	Publication Date: Apr 7, 2020
Citations: 2	License type: open-access

R Discovery Prime

EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Epigenetics & Chromatin

Lead the way for us

Similar Papers

MSL Complex Is Attracted to Genes Marked by H3K36 Trimethylation Using a Sequence-Independent Mechanism
Erica Larschan ... Mitzi I Kuroda
Molecular Cell | VOL. 28
Erica Larschan, et. al.Erica Larschan ... Mitzi I Kuroda
01 Oct 2007
Molecular Cell | VOL. 28

Comparative organization of active transcription units in Oncopeltus fasciatus
Victoria E Foe ... Charles D Laird
Cell | VOL. 9
Victoria E Foe, et. al.Victoria E Foe ... Charles D Laird
01 Sep 1976
Cell | VOL. 9

Proteasome inhibition creates a chromatin landscape favorable to RNA Pol II processivity
H Karimi Kinyamu ... Trevor K Archer
Journal of Biological Chemistry | VOL. 295
H Karimi Kinyamu, et. al.H Karimi Kinyamu ... Trevor K Archer
01 Jan 2020
Journal of Biological Chemistry | VOL. 295

Large transcription units unify copy number variants and common fragile sites arising under replication stress.
Thomas E Wilson ... Sountharia Rajendran
Genome Research | VOL. 25
Thomas E Wilson, et. al.Thomas E Wilson ... Sountharia Rajendran
04 Nov 2014
Genome Research | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Epigenetics &amp; Chromatin

More From: Epigenetics & Chromatin