A Character Frequency based Approach to Search for Substrings of a Circular Pattern and its Conjugates in an Online Text

Vinod Prasad

doi:10.7494/csci.2021.22.2.3401

Abstract

A fundamental problem in computational biology is to deal with circular patterns. The problem consists of finding the least certain length substrings of a pattern and its rotations in the database. In this paper, a novel method is presented to deal with circular patterns. The problem is solved using two incremental steps. First, an algorithm is provided that reports all substrings of a given linear pattern in an online text. Next, without losing efficiency, the algorithm is extended to process all circular rotations of the pattern. For a given pattern P of size M, and a text T of size N, the algorithm reports all locations in the text where a substring of Pc is found, where Pc is one of the rotations of P. For an alphabet size σ, using O(M) space, desired goals are achieved in an average O(MN/σ) time, which is O(N) for all patterns of length M ≤ σ. Traditional string processing algorithms make use of advanced data structures such as suffix trees and automaton. We show that basic data structures such as arrays can be used in the text processing algorithms without compromising the efficiency.

Highlights

A fundamental problem in computer science is searching for a pattern in a text
Abstract bird A fundamental problem in computational biology is dealing with circular patterns
For a given pattern P of size M and a text T of size N, the extended algorithm reports all of the locations in the text where a substring of Pc is found a where Pc is one of the rotations of P

Summary

Introduction

A fundamental problem in computer science is searching for a pattern in a text. A slightly different problem (‘circular-pattern-matching,’ or CPM) deals with circular patterns. E approach presented in [12] is comprised of building suffix trees for text T and T , where T is obtained by reversing text T Another suffix tree-based algorithm [9] consists of creating a generalized cyclic suffix tree for all of the rotations of a given set of sequences and applying the search in O(N ) time. Another widely used data structure ‘suffix automaton’ simulates the smallest deterministic finite automaton that recognizes all of the suffixes of a given pattern [3,6].

Assumptions and notations

Pattern preprocessing

Substring search algorithm: linear patterns

Time and apace analysis

10. Experimental results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Character Frequency based Approach to Search for Substrings of a Circular Pattern and its Conjugates in an Online Text

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer Science

Lead the way for us

Journal: Computer Science	Publication Date: Apr 15, 2021
License type: CC BY 4.0

Similar Papers

Non-biopsy detection of intestinal metaplasia and dysplasia in Barrett’s esophagus: a prospective multicenter study
P Sharma ... N Marcon
Endoscopy | VOL. 38
P Sharma, et. al.P Sharma ... N Marcon
01 Dec 2006
Endoscopy | VOL. 38

Indexing Circular Patterns
Costas S Iliopoulos ... M Sohel Rahman
-
Costas S Iliopoulos, et. al.Costas S Iliopoulos ... M Sohel Rahman
01 Jan 2008
01 Jan 2008

Determination of palatal rugae patterns among two ethnic populations of India by logistic regression analysis
Vijayalakshmi S Kotrashetti ... Alka D Kale
Journal of Forensic and Legal Medicine | VOL. 18
Vijayalakshmi S Kotrashetti, et. al.Vijayalakshmi S Kotrashetti ... Alka D Kale
19 Aug 2011
Journal of Forensic and Legal Medicine | VOL. 18

An investigation into the effect of scanning pattern and heat treatment on the mechanical properties of Inconel 718 in the direct metal deposition process
Fareed Kermani ... Mohammad Gavahian
Journal of Materials Research and Technology | VOL. 24
Fareed Kermani, et. al.Fareed Kermani ... Mohammad Gavahian
17 Apr 2023
Journal of Materials Research and Technology | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Character Frequency based Approach to Search for Substrings of a Circular Pattern and its Conjugates in an Online Text

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer Science