Abstract

Consider a regular expression r of length m and a text string T of length n over an alphabet Σ. Then, the RE shortest substring search problem is to find all shortest substrings of T matching r. The previous algorithm proposed by Clarke and Cormack uses an ε-free nondeterministic finite automaton (NFA) and runs in O(ksn) time and O(s) space, where k is the maximum number of outgoing transitions for any state and symbol, and s is the number of states. Generally, an ε-free NFA obtained from a regular expression has s=O(m) and k=O(m); thus the algorithm takes O(m2n) time and O(m) space. We propose a faster algorithm that runs in O(mn) time and O(m) space. The proposed algorithm is based on a Thompson automaton which is an NFA with ε-transitions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.