Abstract

The epigenetics landscape of cells plays a key role in the establishment of cell-type specific gene expression programs characteristic of different cellular phenotypes. Different experimental procedures have been developed to obtain insights into the accessible chromatin landscape including DNase-seq, FAIRE-seq and ATAC-seq. However, current downstream computational tools fail to reliably determine regulatory region accessibility from the analysis of these experimental data. In particular, currently available peak calling algorithms are very sensitive to their parameter settings and show highly heterogeneous results, which hampers a trustworthy identification of accessible chromatin regions. Here, we present a novel method that predicts accessible and, more importantly, inaccessible gene-regulatory chromatin regions solely relying on transcriptomics data, which complements and improves the results of currently available computational methods for chromatin accessibility assays. We trained a hierarchical classification tree model on publicly available transcriptomics and DNase-seq data and assessed the predictive power of the model in six gold standard datasets. Our method increases precision and recall compared to traditional peak calling algorithms, while its usage is not limited to the prediction of accessible and inaccessible gene-regulatory chromatin regions, but constitutes a helpful tool for optimizing the parameter settings of peak calling methods in a cell type specific manner.

Highlights

  • Different studies have shown that active regulatory elements are located in accessible, i.e. nucleosome depleted, chromosomic regions[14,15,16,17,18] and chromatin accessibility is predictive of functional activity within a specific cell type[16]

  • Computational methods used for identifying genomic regions enriched with aligned reads – i.e. peak callers – have important limitations and, depending on the method used, the chromatin accessibility assignments can be significantly different after processing the same dataset

  • The parameterization used for controlling the false discovery rate of the peak callers is key, as more stringent cutoffs render increased false negative rates, while less stringent cutoffs result in increased false positive rates

Read more

Summary

Introduction

Different studies have shown that active regulatory elements are located in accessible, i.e. nucleosome depleted, chromosomic regions[14,15,16,17,18] and chromatin accessibility is predictive of functional activity within a specific cell type[16]. After deriving the classification model from RNA-seq expression data, we performed a thorough validation of our method to predict chromatin accessibility based on a gold standard dataset compiled from TF and histone modification ChIP-seq experiments. This analysis accentuates the clear improvements of our predictions compared to peaks obtained from the most commonly used peak callers (MACS, Hotspot and F-Seq) regardless of the applied false discovery rate thresholds. Our method can predict the accessibility of gene-regulatory regions, but it can optimize the parameters of current peak calling algorithms

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call