Abstract

BackgroundTranscription factor binding site (TFBS) prediction is a difficult problem, which requires a good scoring function to discriminate between real binding sites and background noise. Many scoring functions have been proposed in the literature, but it is difficult to assess their relative performance, because they are implemented in different software tools using different search methods and different TFBS representations.ResultsHere we compare how several scoring functions perform on both real and semi-simulated data sets in a common test environment. We have also developed two new scoring functions and included them in the comparison. The data sets are from the yeast (S. cerevisiae) genome.Our new scoring function LLBG (least likely under the background model) performs best in this study. It achieves the best average rank for the correct motifs. Scoring functions based on positional bias performed quite poorly in this study.ConclusionLLBG may provide an interesting alternative to current scoring functions for TFBS prediction.

Highlights

  • Transcription factor binding site (TFBS) prediction is a difficult problem, which requires a good scoring function to discriminate between real binding sites and background noise

  • The TFBS prediction problem can be defined as follows: Given N hypothetically co-regulated genes and their promoter sequences S = {S1, S2, ..., SN}, search for motifs that are overrepresented in S compared to the set A of all promoter sequences in the genome

  • The scoring functions were tested on eight different yeast data sets (Fig. 1)

Read more

Summary

Introduction

Transcription factor binding site (TFBS) prediction is a difficult problem, which requires a good scoring function to discriminate between real binding sites and background noise. Many scoring functions have been proposed in the literature, but it is difficult to assess their relative performance, because they are implemented in different software tools using different search methods and different TFBS representations. A recent review on both biological and computational aspects of TFBS prediction is [1]. Many software tools exist for TFBS prediction, e.g. Consensus [3], MEME [4,5], AlignACE [6], BioProspector [7], and MDscan [8]. These tools can be classified according to three criteria: 1. These tools can be classified according to three criteria: 1. TFBS representation: How a putative TFBS is represented, e.g. consensus sequence [9,10], PSFM (position specific frequency matrix) [7], Bayesian network [11] and HMM [12]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call