Abstract

BackgroundTranscription factors (TFs) bind specifically to TF binding sites (TFBSs) at cis-regulatory regions to control transcription. It is critical to locate these TF-DNA interactions to understand transcriptional regulation. Efforts to predict bona fide TFBSs benefit from the availability of experimental data mapping DNA binding regions of TFs (chromatin immunoprecipitation followed by sequencing - ChIP-seq).ResultsIn this study, we processed ~ 10,000 public ChIP-seq datasets from nine species to provide high-quality TFBS predictions. After quality control, it culminated with the prediction of ~ 56 million TFBSs with experimental and computational support for direct TF-DNA interactions for 644 TFs in > 1000 cell lines and tissues. These TFBSs were used to predict > 197,000 cis-regulatory modules representing clusters of binding events in the corresponding genomes. The high-quality of the TFBSs was reinforced by their evolutionary conservation, enrichment at active cis-regulatory regions, and capacity to predict combinatorial binding of TFs. Further, we confirmed that the cell type and tissue specificity of enhancer activity was correlated with the number of TFs with binding sites predicted in these regions. All the data is provided to the community through the UniBind database that can be accessed through its web-interface (https://unibind.uio.no/), a dedicated RESTful API, and as genomic tracks. Finally, we provide an enrichment tool, available as a web-service and an R package, for users to find TFs with enriched TFBSs in a set of provided genomic regions.ConclusionsUniBind is the first resource of its kind, providing the largest collection of high-confidence direct TF-DNA interactions in nine species.

Highlights

  • The regulation of gene expression is a complex process involving several biological mechanisms

  • We describe the update of the UniBind database, which stores > 72 million direct Transcription factors (TFs)-Deoxyribonucleic acid (DNA) interactions predicted using an updated Chromatin immunoprecipitation (ChIP)-eat pipeline on ~ 10,000 ChIP-seq peak datasets from nine species: Arabidopsis thaliana, Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, Homo sapiens, Mus musculus, Rattus norvegicus, Saccharomyces cerevisiae, and Schizosaccharomyces pombe

  • Quality control to establish a robust collection of direct TFDNA interactions In UniBind, we aimed to create a robust collection of bona fide direct TF-DNA interactions found in highquality ChIP-seq peak datasets

Read more

Summary

Introduction

The regulation of gene expression is a complex process involving several biological mechanisms. The first step of the regulatory process controls where, when, and at which intensity RNAs are transcribed from their DNA template This level of transcriptional regulation is mainly coordinated by transcription factors (TFs), which. TF ChIP-seq peaks usually span a few hundred base pairs They derive from direct and indirect TF-DNA interactions [4], where the latter can emerge from protein-protein interactions between the ChIP’ed TF and another protein binding the DNA. Several repositories store ChIP-seq peaks [5,6,7,8,9] and are freely available to the community These resources do not provide precise locations of the underlying direct TF-DNA interactions. Efforts to predict bona fide TFBSs benefit from the availability of experimental data mapping DNA binding regions of TFs (chromatin immunoprecipitation followed by sequencing - ChIP-seq)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call