Abstract

Development of reliable transcription factor binding site (TFBS) recognition methods is an important step in the large-scale genome analysis. The most of currently applied methods to predict functional TFBSs are hampered by the high false-positive rates that occur when too few functionally characterised sequences are available and only sequence conservation within a site core is considered. We propose two methods to search for binding sites (BSs) of peroxisome proliferator-activated receptor (PPAR) (peroxisome proliferator response elements, PPREs). The first method is the optimized dinucleotide position weight matrix (PWM) model, the second method represented by SiteGA model that used genetic algorithm with a discriminant function of locally positioned dinucleotides to infer the most important positions and dinucleotides. We used in our analysis two PPRE datasets, consisting of 37 and 98 BSs, correspondingly. We showed that dataset extension improved the accuracy of SiteGA, but not PWM model. Finally we combined both models (PWM and SiteGA) to the dataset of annotated human promoters (EPD). We demonstrated that the larger dataset and the longer window length supported notable growth of accuracies for PWM and SiteGA models. Consequently, a combined PWM and SiteGA application may better restrict the number of potential targets in the EPD promoter dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call