Abstract

We develop an efficient multicore algorithm, PMS6MC, for the (l, d)-motif discovery problem in which we are to find all strings of length l that appear in every string of a given set of strings with at most d mismatches. PMS6MC is based on PMS6, which is currently the fastest single-core algorithm for motif discovery in large instances. The speedup, relative to PMS6, attained by our multicore algorithm ranges from a high of 6.62 for the (17,6) challenging instances to a low of 2.75 for the (13,4) challenging instances on an Intel 6-core system. We estimate that PMS6MC is 2 to 4 times faster than other parallel algorithms for motif search on large instances.

Highlights

  • Motifs are patterns found in biological sequences

  • We compare the run times of PMS6 and PMS6MC on an Intel 6-core system with each core running at 3.3 GHz

  • The speedup achieved by PMS6MC over PMS6 varies from a low of 2.75 for (13,4) instances to a high of 6.62 for (17,6) instances

Read more

Summary

Introduction

Motifs are patterns found in biological sequences These common patterns in different sequences help in understanding gene functions, and lead to the design of better drugs to combat diseases. We consider the version known as the Planted Motif Search (PMS), or (l, d) motif search problem. In PMS, given n input strings and two integers l and d, we aim to find all the strings M of length l ( referred to as l-mers) that are substrings of every input sequence For an l-mer M to be motif for n input strings, there has to be a substring in each of those n input strings that is in the d-neighborhood of M

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call