Abstract

We have conducted a dedicated analysis on the frequency distribution of the TATA Box and TATA extension sequences on six data sets of human promoters. Promoters in these sets have different lengths and are from different types of genes (housekeeping genes, tissue specific genes, and all genes). The statistical approach developed in this study will firstly partition the promoters into bins of 20 bp long, then calculate the frequency distribution of TATA elements and TATA extension sequences. The median value is used to capture outstanding TATA elements or TATA extension sequences when calculating their statistical significance. This study discovered that two of the 16 TATA Box elements (TATAAAAG and TATATAAG) showed the sharpest peaks at the location of 10∼30 bp upstream from transcription start sites where TATA Box is believed to reside. Fourteen TATA Box extensions showed the sharpest peaks at this location as well among all TATA extension sequences. Two of these fourteen TATA extension sequences have been verified to be the transcription factor binding sites by other research efforts. We suggest that the remaining twelve TATA extension sequences are the new putative TATA binding sites. This study also found that there was very little difference between the frequency distribution of TATA elements on housekeeping genes and their frequency distribution on tissue specific genes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.