Simple and Efficient Pattern Matching Algorithms for Biological Sequences

Peyman Neamatollahi,Montassir Hadi,Mahmoud Naghibzadeh

doi:10.1109/access.2020.2969038

Peyman Neamatollahi, Montassir Hadi + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.2969038

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 34	License type: CC BY 4.0

Affiliation: Ferdowsi University of Mashhad

Abstract

The remarkable growth of biological data is a motivation to accelerate the discovery of solutions in many domains of computational bioinformatics. In different phases of the computational pipelines, pattern matching is a very practical operation. For example, pattern matching enables users to find the locations of particular DNA subsequences in a database or DNA sequence. Furthermore, in these expanding biological databases, some patterns are updated over time. To perform faster searches, high-speed pattern matching algorithms are needed. The present paper introduces three pattern matching algorithms that are specially formulated to speed up searches on large DNA sequences. The proposed algorithms raise performance by utilizing word processing (in place of the character processing presented in previous works) and also by searching the least frequent word of the pattern in the sequence. In terms of time cost, the experimental results demonstrate the superiority of the presented algorithms over the other simulated algorithms.

Highlights

In the pattern matching problem, a text, sequence or database is scanned to detect the locations of a pattern in the text [1], [2]
LEAST FREQUENCY PATTERN MATCHING ALGORITHM The Least Frequency Pattern Matching (LFPM) algorithm is an enhancement of Processor-Aware Pattern Matching (PAPM) that is specialized for DNA applications
This section compares the performance of the presented algorithms (FLPM, PAPM, and LFPM) with that of the Brute Force (BF), Boyer-Moore (BM), and Divide and Conquer Pattern Matching (DCPM) algorithms

Summary

INTRODUCTION

In the pattern matching problem, a text, sequence or database is scanned to detect the locations of a pattern in the text [1], [2]. The pattern matching problem arises in the different scopes of computational bioinformatics, which include the basic local alignment search, biomarker discovery, sequence alignment, proteogenomic mapping, and homologous series detection In these disciplines, there is a need to recognize the locations of multiple patterns, including those of amino acids and nucleotides in databases [7], [8]. This approach creates a new class of string-matching algorithms that improve the performance of character-based algorithms By employing this method, the current work decreases the number of detected windows and speeds up the comparisons. The algorithm searches the text for a low-frequency word of the pattern This technique further advances the algorithm’s efficiency by decreasing the number of discovered windows.

RELATED WORK

END-IF

15. END-IF

34. END-WHILE

LEAST FREQUENCY PATTERN MATCHING ALGORITHM

19. END-WHILE

54. END-WHILE

RESULTS

VIII. CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Simple and Efficient Pattern Matching Algorithms for Biological Sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Efficient Pattern Matching Algorithms for DNA Sequences
Peyman Neamatollahi ... Montassir Hadi
-
Peyman Neamatollahi, et. al.Peyman Neamatollahi ... Montassir Hadi
01 Jan 2020
01 Jan 2020

An Improved Hashing Approach for Biological Sequence to Solve Exact Pattern Matching Problems
Prince Mahmud ... Anisur Rahman
Applied Computational Intelligence and Soft Computing | VOL. 2023
Prince Mahmud, et. al.Prince Mahmud ... Anisur Rahman
20 Nov 2023
Applied Computational Intelligence and Soft Computing | VOL. 2023

A new fast technique for pattern matching in biological sequences
Osman Ali Sadek Ibrahim ... Belal A Hamed
The Journal of Supercomputing | VOL. 79
Osman Ali Sadek Ibrahim, et. al.Osman Ali Sadek Ibrahim ... Belal A Hamed
10 Jul 2022
The Journal of Supercomputing | VOL. 79

Attacking Pattern Matching Algorithms Based on the Gap between Average-Case and Worst-Case Complexity
Yu Zhang ... Dongjin Fan
Journal of Advances in Computer Networks | VOL. -
Yu Zhang, et. al.Yu Zhang ... Dongjin Fan
01 Jan 2013
Journal of Advances in Computer Networks | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Simple and Efficient Pattern Matching Algorithms for Biological Sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access