Abstract
Gene expression of multi-cellular organisms is regulated by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To find the functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. However, it is still difficult to tell authentic sites from false positives. Reports have shown that particular TFBS are concentrated in promoters, though a general tendency is uncertain. Computational approaches to reveal structure of promoter as combination of TFBS are required. Here we have examined the correlation between predicted TFBS and promoters, and identified two PWM groups, 1) PWMs whose TFBS are clustered in promoters mainly by the existence of CpG islands (CGI), 2) PWMs whose TFBS are clustered in promoter independent of CGI. As an application of the groups, we show that tissue specific genes can be extracted by finding clusters of predicted TFBS of selected PWMs in promoters.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have