Abstract

Transcription factor-DNA interactions, central to cellular regulation and control, are commonly described by position weight matrices (PWMs). These matrices are frequently used to predict transcription factor binding sites in regulatory regions of DNA to complement and guide further experimental investigation. The DNA sequence preferences of transcription factors, encoded in PWMs, are dictated primarily by select residues within the DNA binding domain(s) that interact directly with DNA. Therefore, the DNA binding properties of homologous transcription factors with identical DNA binding domains may be characterized by PWMs derived from different species. Accordingly, we have implemented a fully automated domain-level homology searching method for identical DNA binding sequences.By applying the domain-level homology search to transcription factors with existing PWMs in the JASPAR and TRANSFAC databases, we were able to significantly increase coverage in terms of the total number of PWMs associated with a given species, assign PWMs to transcription factors that did not previously have any associations, and increase the number of represented species with PWMs over an order of magnitude. Additionally, using protein binding microarray (PBM) data, we have validated the domain-level method by demonstrating that transcription factor pairs with matching DNA binding domains exhibit comparable DNA binding specificity predictions to transcription factor pairs with completely identical sequences.The increased coverage achieved herein demonstrates the potential for more thorough species-associated investigation of protein-DNA interactions using existing resources. The PWM scanning results highlight the challenging nature of transcription factors that contain multiple DNA binding domains, as well as the impact of motif discovery on the ability to predict DNA binding properties. The method is additionally suitable for identifying domain-level homology mappings to enable utilization of additional information sources in the study of transcription factors. The domain-level homology search method, resulting PWM mappings, web-based user interface, and web API are publicly available at http://dodoma.systemsbiology.netdodoma.systemsbiology.net.

Highlights

  • Gene expression is in part regulated by sequence-specific binding of transcription factors (TFs) to target cis-regulatory elements in DNA

  • We demonstrate the validity of the domain-level homology mapping approach on protein binding microarray data and discuss the resulting increase in coverage in terms of the total number of position weight matrices (PWMs) associated with each species, as well as the total number of TFs with an assigned PWM, obtained for each PWM

  • Quantifying increased coverage Owing to the species-centric view that is frequently taken in studying regulatory networks, we assessed the increase in unique position weight matrices (PWMs) associated with each given species enabled by the present domain-level homology mapping approach with respect to existing curation in the JASPAR and TRANSFAC databases

Read more

Summary

Introduction

Gene expression is in part regulated by sequence-specific binding of transcription factors (TFs) to target cis-regulatory elements in DNA. TF-DNA interactions are commonly described by position weight matrices (PWMs), derived from aligning all known TF binding sequences and log transforming the number of observations of each nucleotide at each position [1,2]. These provide, through statistical-mechanical theory, a relationship between the observed DNA sequence frequencies used in formulating PWMs and estimates of TF-DNA binding energies [3]. JASPAR [4] and TRANSFAC [5] are two curated databases providing extensive collections of transcription factor PWMs across many species

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call