A hybrid method for the exact planted (l, d) motif finding problem and its parallelization

Mostafa M Abbas,Mohamed Abouelhoda,Hazem M Bahig

doi:10.1186/1471-2105-13-s17-s10

Mostafa M Abbas, Mohamed Abouelhoda + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-13-s17-s10

Copy DOI

Abstract

BackgroundGiven a set of DNA sequences s1, ..., st, the (l, d) motif problem is to find an l-length motif sequence M , not necessary existing in any of the input sequences, such that for each sequence si, 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. Many exact algorithms have been developed to solve the motif finding problem in the last three decades. However, the problem is still challenging and its solution is limited to small values of l and d.ResultsIn this paper we present a new efficient method to improve the performance of the exact algorithms for the motif finding problem. Our method is composed of two main steps: First, we process q ≤ t sequences to find candidate motifs. Second, the candidate motifs are searched in the remaining sequences. For both steps, we use the best available algorithms. Our method is a hybrid one, because it integrates currently existing algorithms to achieve the best running time. In this paper, we show how the optimal value of q is determined to achieve the best running time. Our experimental results show that there is about 24% speed-up achieved by our method compared to the best existing algorithm. Furthermore, we also present a parallel version of our method running on shared memory architecture. Our experiments show that the performance of our algorithm scales linearly with the number of processors. Using the parallel version, we were able to solve the (21, 8) challenging instance using 8 processors in 20.42 hours instead of 6.68 days of the serial version.ConclusionsOur method speeds up the solution of the exact motif problem. Our method is generic, because it can accommodate any new faster algorithm based on traditional methods. We expect that our method will help to discover longer motifs. The software we developed is available for free for academic research at http://www.nubios.nileu.edu.eg/tools/hymotif.

Highlights

Given a set of DNA sequences s1, ..., st, the (l, d) motif problem is to find an l-length motif sequence M, not necessary existing in any of the input sequences, such that for each sequence si, 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M
Our contribution In a previous work [32,33], we have introduced an idea composed of two stages to speed up the exact algorithms: In the first stage, we generate a set of candidate motifs by applying one of the exact algorithms based on the neighbourhood method using q ≤ t sequences
We propose a parallel version of our algorithm to present a practical solution to the challenging instances of the motif problem

Summary

Introduction

Given a set of DNA sequences s1, ..., st, the (l, d) motif problem is to find an l-length motif sequence M , not necessary existing in any of the input sequences, such that for each sequence si, 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. DNA motifs are short sequences in the genome that play important functional roles in gene regulation. Due to their short length, it is difficult to identify these regions using features intrinsic in their composition. The consensus motif problem is to find an l-length motif sequence M such that in each sequence si, 1 ≤ i ≤ t, there is at least one subsequence pi differing with at most d mismatches from M; i.e., dH(pi , M) ≤ d, where dH is the hamming distance between pi and M

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2012
Citations: 52	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

A hybrid method for the exact planted (l, d) motif finding problem and its parallelization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Comparison of Simple Encoding Schemes in GA’s for the Motif Finding Problem: Preliminary Results
Giovanna Martínez-Arellano ... Carlos A Brizuela
-
Giovanna Martínez-Arellano, et. al.Giovanna Martínez-Arellano ... Carlos A Brizuela
01 Jan 2007
01 Jan 2007

Massively Parallelized DNA Motif Search on FPGA
Yasmeen Farouk ... Hossam Faheem
-
Yasmeen Farouk, et. al.Yasmeen Farouk ... Hossam Faheem
02 Nov 2011
02 Nov 2011

Parallelizing exact motif finding algorithms on multi-core
Mostafa M Abbas ... Hazem M Bahig
The Journal of Supercomputing | VOL. 69
Mostafa M Abbas, et. al.Mostafa M Abbas ... Hazem M Bahig
13 Apr 2014
The Journal of Supercomputing | VOL. 69

A particle swarm optimization solution for challenging planted(l, d)-Motif problem
U Srinivasulu Reddy ... A V Reddy
-
U Srinivasulu Reddy, et. al.U Srinivasulu Reddy ... A V Reddy
01 Apr 2013
01 Apr 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A hybrid method for the exact planted (l, d) motif finding problem and its parallelization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics