Abstract

BackgroundAlthough dozens of algorithms and tools have been developed to find a set of cis-regulatory binding sites called a motif in a set of intergenic sequences using various approaches, most of these tools focus on identifying binding sites that are significantly different from their background sequences. However, some motifs may have a similar nucleotide distribution to that of their background sequences. Therefore, such binding sites can be missed by these tools.ResultsHere, we present a graph-based polynomial-time algorithm, MotifClick, for the prediction of cis-regulatory binding sites, in particular, those that have a similar nucleotide distribution to that of their background sequences. To find binding sites with length k, we construct a graph using some 2(k-1)-mers in the input sequences as the vertices, and connect two vertices by an edge if the maximum number of matches of the local gapless alignments between the two 2(k-1)-mers is greater than a cutoff value. We identify a motif as a set of similar k-mers from a merged group of maximum cliques associated with some vertices.ConclusionsWhen evaluated on both synthetic and real datasets of prokaryotes and eukaryotes, MotifClick outperforms existing leading motif-finding tools for prediction accuracy and balancing the prediction sensitivity and specificity in general. In particular, when the distribution of nucleotides of binding sites is similar to that of their background sequences, MotifClick is more likely to identify the binding sites than the other tools.

Highlights

  • Dozens of algorithms and tools have been developed to find a set of cis-regulatory binding sites called a motif in a set of intergenic sequences using various approaches, most of these tools focus on identifying binding sites that are significantly different from their background sequences

  • We present a polynomial-time algorithm, MotifClick, for the problem based on this formulation while considering the distributions of nucleotides in binding sites and their background sequences as well as other statistical properties of binding sites

  • To evaluate the performance of MotifClick, we first compared it with four leading general purpose motif finding tools: BioProspector [19], MEME [20], MotifCut [14], and Weeder [10] on both synthetic and real datasets

Read more

Summary

Introduction

Dozens of algorithms and tools have been developed to find a set of cis-regulatory binding sites called a motif in a set of intergenic sequences using various approaches, most of these tools focus on identifying binding sites that are significantly different from their background sequences. Some motifs may have a similar nucleotide distribution to that of their background sequences. Such binding sites can be missed by these tools. Identifying cis-regulatory binding sites recognized by transcription factors (TF) in a genome is the first step towards this goal [2]. Motif-finding algorithms can be largely categorized into “word enumeration” based and “pattern recognition” based methods. The former methods use different strategies to exhaustively enumerate k-mers in the input sequences. WINNOWER [11], CUBIC [12], cWINNOWER [13] and MotifCut [14] use graph-theoretic methods for the enumeration

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.