Abstract

We consider the problem of determinizing and minimizing automata for nested words in practice. For this we compile the nested regular expressions (NREs) from the usual XPath benchmark to nested word automata (NWAs). The determinization of these NWAs, however, fails to produce reasonably small automata. In the best case, huge deterministic NWAs are produced after few hours, even for relatively small NREs of the benchmark. We propose a different approach to the determinization of automata for nested words. For this, we introduce stepwise hedge automata (SHAs) that generalize naturally on both (stepwise) tree automata and on finite word automata. We then show how to determinize SHAs, yielding reasonably small deterministic automata for the NREs from the XPath benchmark. The size of deterministic SHAs automata can be reduced further by a novel minimization algorithm for a subclass of SHAs. In order to understand why the new approach to determinization and minimization works so nicely, we investigate the relationship between NWAs and SHAs further. Clearly, deterministic SHAs can be compiled to deterministic NWAs in linear time, and conversely NWAs can be compiled to nondeterministic SHAs in polynomial time. Therefore, we can use SHAs as intermediates for determinizing NWAs, while avoiding the huge size increase with the usual determinization algorithm for NWAs. Notably, the NWAs obtained from the SHAs perform bottom-up and left-to-right computations only, but no top-down computations. This NWA behavior can be distinguished syntactically by the (weak) single-entry property, suggesting a close relationship between SHAs and single-entry NWAs. In particular, it turns out that the usual determinization algorithm for NWAs behaves well for single-entry NWAs, while it quickly explodes without the single-entry property. Furthermore, it is known that the class of deterministic multi-module single-entry NWAs enjoys unique minimization. The subclass of deterministic SHAs to which our novel minimization algorithm applies is different though, in that we do not impose multiple modules. As further optimizations for reducing the sizes of the constructed SHAs, we propose schema-based cleaning and symbolic representations based on apply-else rules that can be maintained by determinization. We implemented the optimizations and report the experimental results for the automata constructed for the XPathMark benchmark.

Highlights

  • IntroductionNested words are hierarchical structures that are omnipresent in computer science

  • Nested words are hierarchical structures that are omnipresent in computer science.They were used to represent sequences of data trees, like XML or JSON documents, and to analyze the call structure of recursive programs

  • Nested words are the elements of the least set containing internal letters from a given alphabet, triples consisting of an opening parenthesis, a nested word, and a closing parenthesis, and all sequences of nested words

Read more

Summary

Introduction

Nested words are hierarchical structures that are omnipresent in computer science. They were used to represent sequences of data trees, like XML or JSON documents, and to analyze the call structure of recursive programs. The idea of nested words is to generalize on both words and trees, resulting in sequences of unranked trees that are known as hedges. Otherwise, nested words can be obtained by enriching Dyck words with internal letters, besides opening and closing parentheses. Nested words are the elements of the least set containing internal letters from a given alphabet, triples consisting of an opening parenthesis, a nested word, and a closing parenthesis, and all sequences of nested words. Last but not least, nested words can be seen as words over an alphabet with internal letters, opening parentheses, and closing parentheses, under the conditions that the parenthesis are well nested, so that every opening parenthesis is properly closed and every closing parenthesis properly opened

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call