Abstract

Understanding the cell-specific binding patterns of transcription factors (TFs) is fundamental to studying gene regulatory networks in biological systems, for which ChIP-seq not only provides valuable data but is also considered as the gold standard. Despite tremendous efforts from the scientific community to conduct TF ChIP-seq experiments, the available data represent only a limited percentage of ChIP-seq experiments, considering all possible combinations of TFs and cell lines. In this study, we demonstrate a method for accurately predicting cell-specific TF binding for TF-cell line combinations based on only a small fraction (4%) of the combinations using available ChIP-seq data. The proposed model, termed TFImpute, is based on a deep neural network with a multi-task learning setting to borrow information across transcription factors and cell lines. Compared with existing methods, TFImpute achieves comparable accuracy on TF-cell line combinations with ChIP-seq data; moreover, TFImpute achieves better accuracy on TF-cell line combinations without ChIP-seq data. This approach can predict cell line specific enhancer activities in K562 and HepG2 cell lines, as measured by massively parallel reporter assays, and predicts the impact of SNPs on TF binding.

Highlights

  • Transcription factors (TFs) play a central role in regulating gene expression

  • To study the binding of a TF to a DNA sequence in a cell line without corresponding ChIP-seq data, researchers would check whether there is a motif for the TF in the sequence

  • We demonstrate how to model the TF binding problem using deep learning and achieve cell specific binding prediction for TF-cell line combinations without ChIP-seq data

Read more

Summary

Introduction

Transcription factors (TFs) play a central role in regulating gene expression. TF binding to chromatin is primarily dictated by the DNA sequence, the binding patterns can be cell-specific through cooperative interactions between many different TFs. Mutations in cisregulatory elements can influence TF binding, with potentially deleterious effects [1,2,3]; it is important to assess the impact of cis-element mutations on cell-specific TF binding and gene expression. ChIP-seq has been the gold standard for evaluating cell-specific TF binding [4]. Despite tremendous efforts from the scientific community to generate largescale TF ChIP-seq data, such as those undertaken by the ENCODE consortium [5], most TFs have been profiled for only a limited number of cells and conditions. Computational modeling of TF ChIP-seq data to infer the underlying binding rules and to predict TF binding under un-profiled conditions could be a useful and economical approach

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call