Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns

Maria Osmala,Harri Lähdesmäki

doi:10.1186/s12859-020-03621-3

Maria Osmala, Harri Lähdesmäki

Open Access

https://doi.org/10.1186/s12859-020-03621-3

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Jul 20, 2020
Citations: 8	License type: open-access

Affiliation: Aalto University

Abstract

BackgroundThe binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently.ResultsIn this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods.ConclusionPREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.

Highlights

The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq)
The performances of PREPRINT and the state-of-the-art methods Random Forest-based Enhancer identification from Chromatin States (RFECS) and ChromHMM were compared on chromatin feature data from the ENCODE first data production phase Tier 1 cell lines K562 and GM12878
The lengths of the PREPRINT enhancer predictions were less sensitive to changes in the prediction threshold than the lengths of the RFECS predictions, the prediction threshold was demonstrated to influence the final number of predicted enhancers and their validation rates

Summary

Introduction

The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. The current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. The methods have adopted the chromatin feature data produced by the next-generation sequencing technologies. Enhancers have been shown to possess certain molecular and structural chromatin features, which can be utilised to locate them genome-wide. The Chromatin Immunoprecipitation coupled with sequencing (ChIP-seq) can quantify the chromosomal locations for tens to hundreds of individual TRFs and histone modifications [9, 10]. Various combinations of the chromatin features have been adopted in several studies to locate enhancers [5, 11,12,13,14,15,16,17,18]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

HERV-H RNA is abundant in human embryonic stem cells and a precise marker for pluripotency
Federico A Santoni ... Jessica Guerra
Retrovirology | VOL. 9
Federico A Santoni, et. al.Federico A Santoni ... Jessica Guerra
01 Dec 2012
Retrovirology | VOL. 9

Distribution of Transcription Factor Binding Sites in the Yeast Genome Suggests Abundance of Coordinately Regulated Genes
Andreas Wagner
Genomics | VOL. 50
Andreas WagnerAndreas Wagner
01 Jun 1998
Genomics | VOL. 50

Accurate prediction of cis-regulatory modules reveals a prevalent regulatory genome of humans.
Pengyu Ni ... Zhengchang Su
NAR genomics and bioinformatics | VOL. 3
Pengyu Ni, et. al.Pengyu Ni ... Zhengchang Su
09 Apr 2021
NAR genomics and bioinformatics | VOL. 3

Decision letter: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
George H Perry
-
George H PerryGeorge H Perry
07 Sep 2022
07 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics