Abstract
We develop an efficient multicore algorithm, PMS6MC, for the (l, d)-motif discovery problem in which we are to find all strings of length l that appear in every string of a given set of strings with at most d mismatches. PMS6MC is based on PMS6, which is currently the fastest single-core algorithm for motif discovery in large instances. The speedup, relative to PMS6, attained by our multicore algorithm ranges from a high of 6.62 for the (17,6) challenging instances to a low of 2.75 for the (13,4) challenging instances on an Intel 6-core system. We estimate that PMS6MC is 2 to 4 times faster than other parallel algorithms for motif search on large instances.
Highlights
Motifs are patterns found in biological sequences
We compare the run times of PMS6 and PMS6MC on an Intel 6-core system with each core running at 3.3 GHz
The speedup achieved by PMS6MC over PMS6 varies from a low of 2.75 for (13,4) instances to a high of 6.62 for (17,6) instances
Summary
Motifs are patterns found in biological sequences These common patterns in different sequences help in understanding gene functions, and lead to the design of better drugs to combat diseases. We consider the version known as the Planted Motif Search (PMS), or (l, d) motif search problem. In PMS, given n input strings and two integers l and d, we aim to find all the strings M of length l ( referred to as l-mers) that are substrings of every input sequence For an l-mer M to be motif for n input strings, there has to be a substring in each of those n input strings that is in the d-neighborhood of M
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have