Abstract

BackgroundGene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions.ResultsWe developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI) against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster), we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI.ConclusionNot all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1) those that show TFBS clustered in promoters associated with CGI, and (2) those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in regulatory regions.

Highlights

  • Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences

  • Divergent preferences of TF binding sites (TFBS) for promoter sequences We determined whether predicted TFBS formed clusters in human promoter sequences or in non-promoter sequences for each position weight matrices (PWMs) using the cluster score described in the Method section

  • We found that TFBS clusters corresponding to 47% of PWMs are positively correlated with promoter sequences, and that TFBS clusters corresponding to around 11% of PWMs are negatively correlated with promoter sequences

Read more

Summary

Introduction

Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. The identification of genuine TFBS by searching clusters of predicted TFBS has been successful; these studies were evaluated with only specific genes and TF sets, such as those found in Yeast[11], Drosophila (early developmental enhancer) [12,13,14], liver [15], LSF and muscle specific regulatory regions [16,17] It is unknown whether this method is applicable to other species, or genes. The program rVISTA utilizes information from conserved regions between human and mouse, in addition to clusters of TFBS predicted by the MATCH (BIOBASE) program [19] This approach was evaluated using several known TFs (AP-1, NFAT, and GATA-3) and genes from the cytokine gene cluster. Most of these studies use specific sets of coregulated genes to identify common predicted TFBS clusters, and cannot be applied directly to the study of general properties of promoters

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call