EP-DNN: A Deep Neural Network-Based Global Enhancer Prediction Algorithm.

Seong Gon Kim,Mrudul Harwani,Somali Chaterji,Ananth Grama

doi:10.1038/srep38433

Abstract

We present EP-DNN, a protocol for predicting enhancers based on chromatin features, in different cell types. Specifically, we use a deep neural network (DNN)-based architecture to extract enhancer signatures in a representative human embryonic stem cell type (H1) and a differentiated lung cell type (IMR90). We train EP-DNN using p300 binding sites, as enhancers, and TSS and random non-DHS sites, as non-enhancers. We perform same-cell and cross-cell predictions to quantify the validation rate and compare against two state-of-the-art methods, DEEP-ENCODE and RFECS. We find that EP-DNN has superior accuracy with a validation rate of 91.6%, relative to 85.3% for DEEP-ENCODE and 85.5% for RFECS, for a given number of enhancer predictions and also scales better for a larger number of enhancer predictions. Moreover, our H1 → IMR90 predictions turn out to be more accurate than IMR90 → IMR90, potentially because H1 exhibits a richer signature set and our EP-DNN model is expressive enough to extract these subtleties. Our work shows how to leverage the full expressivity of deep learning models, using multiple hidden layers, while avoiding overfitting on the training data. We also lay the foundation for exploration of cross-cell enhancer predictions, potentially reducing the need for expensive experimentation.

Highlights

Cell types are unique, in spite of the fact that they contain the same genomic DNA, largely because of their differential gene expression patterns
When keeping the number of enhancer predictions by RFECS, DEEP-ENCODE (DEEP-EN), and EP-deep neural network (DNN) equal for purposes of comparison, we find that our protocol has superior accuracy, with a validation rate of 91.6%, for same-cell and cross-cell predictions, relative to
The first and most important observation is that EP-DNN performs better for validation and invalidity rates for both cell types, for same-cell and cross-cell predictions, across the entire range of number of enhancers being predicted

Summary

Introduction

In spite of the fact that they contain the same genomic DNA, largely because of their differential gene expression patterns. Enhancers can be defined as short DNA sequences regulating temporal and cell-type specific basal gene-transcription levels, from transcription start sites (TSSs), at distances ranging from hundreds of bases to, in rare cases, even megabases[6,7,8] Knowing their properties, regulatory activity, and genomic targets is crucial to the functional understanding of cellular events, ranging from cellular homeostasis to differentiation. The first is mapping specific transcription factor (TF) binding sites (TFBS) through ChIP-seq[22] This stems from the fact that short enhancer DNA sequences serve as binding sites for TFs, and the combined regulatory cues of all bound TFs determine ultimate enhancer activity[23,24]. The fourth approach involves histone modification patterns produced by ChIP-seq that consistently mark enhancer regions[29,30,31,32,33], and which is our method of choice in this paper

Methods

Results

Discussion

Conclusion