Abstract

BackgroundDiscovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a position-specific scoring matrix (PSSM) and a position frequency matrix. Very frequently, we need to query a motif in a database of motifs by seeking its similar motifs, merge similar TFBS motifs possibly identified by the same TF, separate irrelevant motifs, or filter out spurious motifs. Therefore, a novel metric is required to seize slight differences between irrelevant motifs and highlight the similarity between motifs of the same group in all these applications. While there are already several metrics for motif similarity proposed before, their performance is still far from satisfactory for these applications.MethodsA novel metric has been proposed in this paper with name as SPIC (Similarity with Position Information Contents) for measuring the similarity between a column of a motif and a column of another motif. When defining this similarity score, we consider the likelihood that the column of the first motif's PFM can be produced by the column of the second motif's PSSM, and multiply the likelihood by the information content of the column of the second motif's PSSM, and vise versa. We evaluated the performance of SPIC combined with a local or a global alignment method having a function for affine gap penalty, for computing the similarity between two motifs. We also compared SPIC with seven existing state-of-the-arts metrics for their capability of clustering motifs from the same group and retrieving motifs from a database on three datasets.ResultsWhen used jointly with the Smith-Waterman local alignment method with an affine gap penalty function (gap open penalty is equal to1, gap extension penalty is equal to 0.5), SPIC outperforms the seven existing state-of-the-art motif similarity metrics combined with their best alignments for matching motifs in database searches, and clustering the same TF's sub-motifs or distinguishing relevant ones from a miscellaneous group of motifs.ConclusionsWe have developed a novel motif similarity metric that can more accurately match motifs in database searches, and more effectively cluster similar motifs and differentiate irrelevant motifs than do the other seven metrics we are aware of.

Highlights

  • Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome

  • Transcriptional regulation is triggered by the binding of TF proteins to 6-25 bps specific DNA sequences called cis-regulatory elements (CREs) or transcription factor binding sites (TFBSs) in a gene’s promoter region or remote regulatory regions such as enhancers, silencers and insulators [3]

  • Motif retrieval Given the profile of a motif whose cognate TF information is unknown, one of frequently used applications is to search the motif in a database

Read more

Summary

Introduction

Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a positionspecific scoring matrix (PSSM) and a position frequency matrix. Transcriptional regulation is triggered by the binding of TF proteins to 6-25 bps (base pairs) specific DNA sequences called cis-regulatory elements (CREs) or transcription factor binding sites (TFBSs) in a gene’s promoter region or remote regulatory regions such as enhancers, silencers and insulators [3]. These TF-DNA interactions in a cell form the transcriptional regulatory network (TRN) of the cell [4]. We can use one of the two matrices to scan the sequences potentially containing TFBSs to discover them [10]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call