Abstract

BackgroundTranscription factor binding site (TFBS) motifs can be accurately represented by position frequency matrices (PFM) or other equivalent forms. We often need to compare TFBS motifs using their PFMs in order to search for similar motifs in a motif database, or cluster motifs according to their binding preference. The majority of current methods for motif comparison involve a similarity metric for column-to-column comparison and a method to find the optimal position alignment between the two compared motifs. In some applications, alignment-free methods might be preferred; however, few such methods with high accuracy have been described.Methodology/Principal FindingsHere we describe a novel alignment-free method for quantifying the similarity of motifs using their PFMs by converting PFMs into k-mer vectors. The motifs could then be compared by measuring the similarity among their corresponding k-mer vectors.Conclusions/SignificanceWe demonstrate that our method in general achieves similar performance or outperforms the existing methods for clustering motifs according to their binding preference and identifying similar motifs of transcription factors of the same family.

Highlights

  • Transcription factors (TFs) play important roles in the regulation of gene transcription through binding to specific DNA sequences called TF binding sites (TFBSs), which are usually 5–25 bp in length [1,2]

  • A TFBS motif is often represented by a position frequency matrix (PFM), which consists of nucleotide frequencies at each position of the motif [3]

  • We evaluated our algorithm for identifying the TFBS motifs of structural and/or evolutionarily related TFs using all three datasets by the ‘‘best-hit’’ approach used in Mahony et al [8]

Read more

Summary

Introduction

Transcription factors (TFs) play important roles in the regulation of gene transcription through binding to specific DNA sequences called TF binding sites (TFBSs), which are usually 5–25 bp in length [1,2]. A PFM is derived from the alignment of known TFBSs of the TF, and it largely reflects the TF’s DNA binding preference at each position. In genome-scale TFBS prediction applications, redundant and sub motifs of the same TFs are often returned by motif finders, and they need to be clustered to form unique motifs [9,10]. In all these applications, the similarity between two motifs needs to be accurately calculated for the desired purposes. Transcription factor binding site (TFBS) motifs can be accurately represented by position frequency matrices (PFM) or other equivalent forms. Alignment-free methods might be preferred; few such methods with high accuracy have been described

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.