Abstract

Ab initio methods of DNA regulatory sequence region prediction known as transcription factor binding sites (TFBS) are a very big challenge to modern bioinformatics. Although the currently available methods are not perfect they are fairly reliable and can be used to search for new potential protein-DNA interaction sites. The biggest problem of ab initio approaches is the very high false positive rate of predicted sites which resul ts mainly from the fact that TFBS are very short and highly degenerate. Because of that they can occur by chance every few hundred bases making the task of computational prediction extremely difficult if one aims to reduce the high false positive rate keeping highest possible sensitiv ity to predict biologically meaningful sequence regions. In this work we present a new application that can be used to predict TFBS regions in very large datasets based on position weight matrix models (PWM’s) using one of the most popular prediction methods. The presented application was used to predict the concentration of TFBS in a set of nearly 2.2 thousand unique sequences of human gene promoter regions. The study revealed that the concentration of TFBS further than 1kbp from the transcription initiation site is constant but it decreases rapidly while getting closer to the transcriptio n initiation site. The decreasing TFBS concentration in the vicinity of genes might result from evolutionary selection which keeps only sites responsible for interactions with proteins being par t of a specific regulatory mechanism leading to cells survival.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call