Abstract

A fundamental problem in computational biology is to deal with circular patterns. The problem consists of finding the least certain length substrings of a pattern and its rotations in the database. In this paper, a novel method is presented to deal with circular patterns. The problem is solved using two incremental steps. First, an algorithm is provided that reports all substrings of a given linear pattern in an online text. Next, without losing efficiency, the algorithm is extended to process all circular rotations of the pattern. For a given pattern P of size M, and a text T of size N, the algorithm reports all locations in the text where a substring of Pc is found, where Pc is one of the rotations of P. For an alphabet size σ, using O(M) space, desired goals are achieved in an average O(MN/σ) time, which is O(N) for all patterns of length M ≤ σ. Traditional string processing algorithms make use of advanced data structures such as suffix trees and automaton. We show that basic data structures such as arrays can be used in the text processing algorithms without compromising the efficiency.

Highlights

  • A fundamental problem in computer science is searching for a pattern in a text

  • Abstract bird A fundamental problem in computational biology is dealing with circular patterns

  • For a given pattern P of size M and a text T of size N, the extended algorithm reports all of the locations in the text where a substring of Pc is found a where Pc is one of the rotations of P

Read more

Summary

Introduction

A fundamental problem in computer science is searching for a pattern in a text. A slightly different problem (‘circular-pattern-matching,’ or CPM) deals with circular patterns. E approach presented in [12] is comprised of building suffix trees for text T and T , where T is obtained by reversing text T Another suffix tree-based algorithm [9] consists of creating a generalized cyclic suffix tree for all of the rotations of a given set of sequences and applying the search in O(N ) time. Another widely used data structure ‘suffix automaton’ simulates the smallest deterministic finite automaton that recognizes all of the suffixes of a given pattern [3,6].

Assumptions and notations
Pattern preprocessing
Substring search algorithm: linear patterns
Time and apace analysis
10. Experimental results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.