Abstract

The remarkable growth of biological data is a motivation to accelerate the discovery of solutions in many domains of computational bioinformatics. In different phases of the computational pipelines, pattern matching is a very practical operation. For example, pattern matching enables users to find the locations of particular DNA subsequences in a database or DNA sequence. Furthermore, in these expanding biological databases, some patterns are updated over time. To perform faster searches, high-speed pattern matching algorithms are needed. The present paper introduces three pattern matching algorithms that are specially formulated to speed up searches on large DNA sequences. The proposed algorithms raise performance by utilizing word processing (in place of the character processing presented in previous works) and also by searching the least frequent word of the pattern in the sequence. In terms of time cost, the experimental results demonstrate the superiority of the presented algorithms over the other simulated algorithms.

Highlights

  • In the pattern matching problem, a text, sequence or database is scanned to detect the locations of a pattern in the text [1], [2]

  • LEAST FREQUENCY PATTERN MATCHING ALGORITHM The Least Frequency Pattern Matching (LFPM) algorithm is an enhancement of Processor-Aware Pattern Matching (PAPM) that is specialized for DNA applications

  • This section compares the performance of the presented algorithms (FLPM, PAPM, and LFPM) with that of the Brute Force (BF), Boyer-Moore (BM), and Divide and Conquer Pattern Matching (DCPM) algorithms

Read more

Summary

INTRODUCTION

In the pattern matching problem, a text, sequence or database is scanned to detect the locations of a pattern in the text [1], [2]. The pattern matching problem arises in the different scopes of computational bioinformatics, which include the basic local alignment search, biomarker discovery, sequence alignment, proteogenomic mapping, and homologous series detection In these disciplines, there is a need to recognize the locations of multiple patterns, including those of amino acids and nucleotides in databases [7], [8]. This approach creates a new class of string-matching algorithms that improve the performance of character-based algorithms By employing this method, the current work decreases the number of detected windows and speeds up the comparisons. The algorithm searches the text for a low-frequency word of the pattern This technique further advances the algorithm’s efficiency by decreasing the number of discovered windows.

RELATED WORK
END-IF
15. END-IF
34. END-WHILE
LEAST FREQUENCY PATTERN MATCHING ALGORITHM
19. END-WHILE
54. END-WHILE
RESULTS
VIII. CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.