Abstract

BackgroundComputational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events.ResultsWe assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types.ConclusionIntegrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”.

Highlights

  • Computational prediction of transcription factor (TF) binding sites in different cell types is challenging

  • One key question related to the general usefulness of this approach is whether or not the model learned from other TFs in other cell types is transferable

  • We assessed the transferability for many TFs and different cell lines, and discovered that in most cases a model learning from other TFs, especially the combination of many TFs, performed almost as well as the model learned from the target TF

Read more

Summary

Introduction

Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events. Transcription factors (TFs) bind to specific DNA sequences and regulate expression of downstream genes. Differences in chromatin accessibility are cell type specific, and integration of the information will reflect the dynamic nature of TFBS in different cell types. Chromatin accessibility can be determined by DNase-Seq [15,16,17] or ATAC-Seq [18, 19], and many of these datasets have become available in diverse cell and tissue types

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call