Abstract

In this paper, an automata-based algorithm that finds the valid shifts of a given set of words W in text T is presented. Unlike known string matching algorithms, a preprocessing phase is applied to T and not to the words being searched for. In this phase, a deterministic finite state automaton (DFA) that recognizes the words in T is built and is augmented with their shifts in T. The preprocessing phase is relatively expensive in terms of time and space. However, it needs to be done once for any number of words to match in a given text document. The algorithm is analyzed for complexity, implemented and compared with an adjusted version of KMP algorithm. It showed better performance than KMP algorithm for large number of words to match in T.

Highlights

  • In this paper, a special case of string matching [1] problem is considered that is called multiple word matching

  • The motivation for this research is that it is common to have a text document that need to be repeatedly searched for single words

  • The idea is based on scanning the words in T and incrementally building a deterministic finite automaton (DFA) [2] that recognizes only the words of T

Read more

Summary

INTRODUCTION

A special case of string matching [1] problem is considered that is called multiple word matching. The motivation for this research is that it is common to have a text document that need to be repeatedly searched for single words. Another motivation is the speed illustrated by the proposed algorithm to solve this problem compared with other matching algorithms for large |W|. The DFA is used to search for a set of words W (repetition of words in W is allowed) Building this DFA is time consuming, it is needed to be built only once for searching any number of words in T. The paper ends up with a conclusion and a list of references

RELATED WORK
PROPOSED ALGORITHM
ANALYSIS OF THE PROPOSED ALGORITHM
EXPERIMENTAL STUDY
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call