Abstract

Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and dictionary query. Among the algorithms to solve the problem, the most efficient one is SAIL, which is time consuming, especially when the pattern is long. In addition, existing algorithms based on bit-parallelism cannot handle a pattern that has only one pattern character between successive wildcards and the minimum local length constraints are zero. We propose an algorithm BPBM to handle online sequential pattern matching. In BPBM, an extended bit-parallelism operation is used to accelerate the matching process. An effective transition window mechanism with two nondeterministic finite state automatons (NFAs) is adopted to drop the useless scan window. It identifies gap constraints automatically and just scans once to export occurrences with exact match positions. Theoretical analysis and experimental results show that the BPBM algorithm is more competitive than other peers. It has an absolute advantage on search time complexity. It also has better stability that decreases operation costs with the increasing of the size of sequence alphabet or the length of the pattern. We also study off-line pattern matching. With twice pruning, left-most and right-most, we can increase the matching ratio about 2.08% on average.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call