Abstract

Finding patterns in biological sequences is a crucial and intriguing task. This paper explores the (Ɩ, d) motif search problem, also known as Planted Motif Search (PMS), and discusses its challenging nature as an NP-hard problem. PMS and (Ɩ, d) motif search algorithms are believed to represent the next generation of tools for motif discovery. In this context, PMS deals with n biological sequences and two parameters, Ɩ and d, to identify sequences of Ɩ length that occur in all input strings with, at most, d mismatches. Many existing exact PMS algorithms exhibit exponential time complexity in worst-case scenarios. This paper introduces an innovative algorithm that focuses on improving the efficiency of the sample-driven portion of the process. Specifically, dynamic programming techniques are employed to avoid redundant calculations in frequently used subtrees. Furthermore, this paper presents novel approaches to enhance algorithm performance, such as utilizing a trie tree that significantly reduces the time for the “sort rows by size” step. It has also reduced the spaces that take linked lists on LL-PMS8 (Hasan et al., Jun., 2022) or reduced the number of l-mers. Using trie tree as the main way to speed things up gives a much better result than older versions of PMS methods like LL-PMS8 (Hasan et al., Jun., 2022). Overall time complexity reduced than the previous method is 26.17 % and 16.48 % for real-world and generated datasets (Hasan et al., 2020).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.