Abstract

DNA motif is the pattern shared by similar fragments in DNA sequences, which plays a key role in regulating gene expression, and DNA motif discovery has become a key research topic. Exact planted (l,d)-motif search (PMS) is one of the motif discovery approaches, which aims to find from t sequences all the (l,d)-motifs that are motifs of l length appearing in at least qt sequences with at most d mismatches. The existing exact PMS algorithms are only suitable for small datasets of DNA sequences. The development of high-throughput sequencing technology generates vast amount of DNA sequence data, which brings challenges to solving exact PMS problems efficiently. Therefore, we propose an efficient exact PMS algorithm called PMmotif for large datasetsof DNA sequences, after analyzing the time complexity of the existing exact PMS algorithms. PMmotif finds (l,d) -motifs with strategy by searching the branches on the pattern tree that may contain (l,d) -motifs. It is verified by experiments that the running time ratio of the existing excellentPMS algorithmstoPMmotif isbetween14.83and 58.94. In addition, for the first time, PMmotif can solve the (15,5) and(17,6) challenge problem instances on large DNA sequence datasets within 24 hours.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call