Finding Motifs with Insufficient Number of Strong Binding Sites

Henry C.M Leung,Roni Rosenfeld,S.M Yiu,W.W Tsang,Francis Y.L Chin

doi:10.1089/cmb.2005.12.686

Henry C.M Leung, Roni Rosenfeld + Show 3 more

Open Access

https://doi.org/10.1089/cmb.2005.12.686

Copy DOI

Abstract

A molecule called transcription factor usually binds to a set of promoter sequences of coexpressed genes. As a result, these promoter sequences contain some short substrings, or binding sites, with similar patterns. The motif discovering problem is to find these similar patterns and motifs in a set of sequences. Most existing algorithms find the motifs based on strong-signal sequences only (i.e., those containing binding sites very similar to the motif). In this paper, we use a probability matrix to represent a motif to calculate the minimum total number of binding sites required to be in the input dataset in order to confirm that the discovered motifs are not artifacts. Next, we introduce a more general and realistic energy-based model, which considers all sequences with varying degrees of binding strength to the transcription factors (as measured experimentally). By treating sequences with varying degrees of binding strength, we develop a heuristic algorithm called EBMF (Energy-Based Motif Finding Algorithm) to find the motif, which can handle sequences ranging from those that contain more than one binding site to those that contain none. EBMF can find motifs for datasets that do not even have the required minimum number of binding sites as previously derived. EBMF compares favorably with common motif-finding programs AlignACE and MEME. In particular, for some simulated and real datasets, EBMF finds the motif when both AlignACE and MEME fail to do so.

Highlights

One great challenge in molecular biology is to understand the regulation of gene expression - the process by which a segment of DNA is decoded to form a protein
According to the results by Buhler and Tompa [Buhler 2002], these sequences are much less than the minimum number of input sequence required, which is 4, and it should be theoretically impossible to find the motif for this input set (We set n = 787, t = 3, l = 13 and d = 2). We tested this input set on two common motif-finding programs, AlignACE [Hughes 2000, Roth 1998] and MEME [Bailey 1994], which are based on the strong-signal model
Let m be the total number of sequences, n be the length of each sequence, t be the number of sequences with binding sites and B∗ be the number of binding sites in the t sequences, we generated the simulated data as follow

Summary

Introduction

One great challenge in molecular biology is to understand the regulation of gene expression - the process by which a segment of DNA is decoded to form a protein. An mRNA molecule is formed by copying a gene from the DNA. The mRNA is decoded to produce a protein. To start the transcription process for a particular gene, one or more corresponding proteins, called transcription factors, have to bind to several specific regions, called binding sites, in the promoter region of the gene. A transcription factor can bind to multiple binding sites, but these sites typically have similar length (usually about 8 to 20 bp) and a common DNA sequence pattern. The common patterns for their corresponding binding sites, referred to as the motifs, are still unknown. Many laboratory-based methods for motif identification have been developed, these experimental methods are both expensive and time-consuming

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computational Biology	Publication Date: Jul 1, 2005
Citations: 37	License type: cc-by

R Discovery Prime

R Discovery Prime

Finding Motifs with Insufficient Number of Strong Binding Sites

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computational Biology

Lead the way for us

Similar Papers

Structural studies on ribosomes: I. The binding of proflavine to Escherichia coli ribosomes
S.H Miall ... I.O Walker
BBA Section Nucleic Acids And Protein Synthesis | VOL. 145
S.H Miall, et. al.S.H Miall ... I.O Walker
01 Aug 1967
BBA Section Nucleic Acids And Protein Synthesis | VOL. 145

Evaluation of the Number of Binding Sites in Proteins from their Intrinsic Fluorescence: Limitations and Pitfalls
Eduardo Lissi ... Cristian Calderón
Photochemistry and Photobiology | VOL. 89
Eduardo Lissi, et. al.Eduardo Lissi ... Cristian Calderón
26 Aug 2013
Photochemistry and Photobiology | VOL. 89

Effect of the competition of copper and cobalt on the lability of Ni(II)–organic ligand complexes. Part I. In model solutions containing Ni(II) and a well-characterized fulvic acid
R Mandal ... W.H Schroeder
Analytica Chimica Acta | VOL. 395
R Mandal, et. al.R Mandal ... W.H Schroeder
01 Aug 1999
Analytica Chimica Acta | VOL. 395

Decision letter: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
George H Perry
-
George H PerryGeorge H Perry
07 Sep 2022
07 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Finding Motifs with Insufficient Number of Strong Binding Sites

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computational Biology