Abstract
Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the -motif search (or Planted Motif Search (PMS)). A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS), is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com.
Highlights
Detection of rare events happening in a set of DNA/protein sequences often provides the main clue leading to new biological discoveries
In this paper we have presented Algorithm qPMS7 for the Quorum Planted Motif Search (qPMS) problem and tested it on DNA as well as protein sequences
Experimental results indicate that Algorithm qPMS7 is faster than other existing algorithms, especially for large values of ‘ and d
Summary
Detection of rare events happening in a set of DNA/protein sequences often provides the main clue leading to new biological discoveries. All known PMS Algorithms (both exact and approximate) are only able to find (‘, d)-motifs for up to certain values of ‘ and d. The qPMS problem is to find all the motifs that have motif instances present in q out of the n input sequences. QPMS algorithms can be used to find DNA motifs and protein motifs as well as transcription factor binding sites. To the best of our knowledge, the currently best exact qPMS algorithm is Algorithm qPMSPrune due to [6] that can n only solve instances up to ‘~17 and d~5 for q~ 2, where n is the number of input sequences. When applied to the PMS problem, our algorithm is faster than the best PMS algorithm, i.e., Algorithm PMS6 due to [3]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have