Direct vs 2-stage approaches to structured motif finding

Maria Federico,Paolo Valente,Manuela Montangero,Mauro Leoncini

doi:10.1186/1748-7188-7-20

Abstract

BackgroundThe notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In turn, DNA structured motifs are a mathematical counterpart that models sets of TFBSs that work in concert in the gene regulations processes of higher eukaryotic organisms. Typically, a structured motif is composed of an ordered set of isolated (or simple) motifs, separated by a variable, but somewhat constrained number of “irrelevant” base-pairs. Discovering structured motifs in a set of DNA sequences is a computationally hard problem that has been addressed by a number of authors using either a direct approach, or via the preliminary identification and successive combination of simple motifs.ResultsWe describe a computational tool, named SISMA, for the de-novo discovery of structured motifs in a set of DNA sequences. SISMA is an exact, enumerative algorithm, meaning that it finds all the motifs conforming to the specifications. It does so in two stages: first it discovers all the possible component simple motifs, then combines them in a way that respects the given constraints. We developed SISMA mainly with the aim of understanding the potential benefits of such a 2-stage approach w.r.t. direct methods. In fact, no 2-stage software was available for the general problem of structured motif discovery, but only a few tools that solved restricted versions of the problem. We evaluated SISMA against other published tools on a comprehensive benchmark made of both synthetic and real biological datasets. In a significant number of cases, SISMA outperformed the competitors, exhibiting a good performance also in most of the cases in which it was inferior.ConclusionsA reflection on the results obtained lead us to conclude that a 2-stage approach can be implemented with many advantages over direct approaches. Some of these have to do with greater modularity, ease of parallelization, and the possibility to perform adaptive searches of structured motifs. As another consideration, we noted that most hard instances for SISMA were easy to detect in advance. In these cases one may initially opt for a direct method; or, as a viable alternative in most laboratories, one could run both direct and 2-stage tools in parallel, halting the computations when the first halts.

Highlights

The notion of DNA motif is a mathematical abstraction used to model regions of the DNA that are bound by a given Transcription Factor to regulate gene expression or repression
Being able to identify transcription factor binding sites (TFBSs) is crucial to our understanding of the mechanisms that regulate gene expression, and of the functions of individual genes regulated by newly discovered TFBSs [2]
We present a novel 2stage algorithm, called SISMA, that enumerates all the structured motifs conforming to input specifications

Summary

Introduction

The notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In lower eukaryotes TFBSs are usually short DNA strings (5-25 base pairs long) bound by a single TF, that frequently appear, with possibly some mutations, upstream of the TSS in the proximal promoter region of co-regulated genes. There may be multiple binding sites for a single TF in a single gene’s promoter region; there can be great variability in the binding sites of a single TF; the regulatory elements may be located several kilobases away from the TSS, either upstream or downstream or in the introns of the genes that they regulate [3], and in this case they are often organized in functional groups (called cis-regulatory modules) [2] bound by several interacting TFs in a cooperative or antagonistic way. Mutations in TFBS underlie several degenerative human diseases (e.g., all forms of cancer) and constitute a substantial component of the phenotypic variability within and across species [5]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms for Molecular Biology	Publication Date: Aug 21, 2012
Citations: 40	License type: cc-by

R Discovery Prime

R Discovery Prime

Direct vs 2-stage approaches to structured motif finding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology

Lead the way for us

Similar Papers

SMOTIF: efficient structured pattern and profile motif search.
Yongqiang Zhang ... Mohammed J Zaki
Algorithms for Molecular Biology | VOL. 1
Yongqiang Zhang, et. al.Yongqiang Zhang ... Mohammed J Zaki
21 Nov 2006
Algorithms for Molecular Biology | VOL. 1

NSAMD: A new approach to discover structured contiguous substrings in sequence datasets using Next-Symbol-Array
Abdolvahed Pari ... Saeed Parseh
Computational Biology and Chemistry | VOL. 64
Abdolvahed Pari, et. al.Abdolvahed Pari ... Saeed Parseh
06 Sep 2016
Computational Biology and Chemistry | VOL. 64

Automated extraction of extended structured motifs using multi-objective genetic algorithm
Mehmet Kaya
Expert Systems With Applications | VOL. 37
Mehmet KayaMehmet Kaya
09 Jul 2009
Expert Systems With Applications | VOL. 37

An efficient algorithm for planted structured motif extraction
Maria Federico ... Roberto Cavicchioli
-
Maria Federico, et. al.Maria Federico ... Roberto Cavicchioli
18 May 2009
18 May 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Direct vs 2-stage approaches to structured motif finding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology