Linear Space Data Structure Research Articles

The states of a finite-state automaton 𝒩 can be identified with collections of words in the prefix closure of the regular language accepted by 𝒩. But words can be ordered, and among the many possible orders a very natural one is the co-lexicographic order. Such naturalness stems from the fact that it suggests a transfer of the order from words to the automaton’s states. This suggestion is, in fact, concrete and in a number of articles automata admitting a total co-lexicographic ( co-lex for brevity) ordering of states have been proposed and studied. Such class of ordered automata — Wheeler automata — turned out to require just a constant number of bits per transition to be represented and enable regular expression matching queries in constant time per matched character. Unfortunately, not all automata can be totally ordered as previously outlined. In the present work, we lay out a new theory showing that all automata can always be partially ordered, and an intrinsic measure of their complexity can be defined and effectively determined, namely, the minimum width p of one of their admissible co-lex partial orders –dubbed here the automaton’s co-lex width . We first show that this new measure captures at once the complexity of several seemingly-unrelated hard problems on automata. Any NFA of co-lex width p : (i) has an equivalent powerset DFA whose size is exponential in p rather than (as a classic analysis shows) in the NFA’s size; (ii) can be encoded using just Θ(log p ) bits per transition; (iii) admits a linear-space data structure solving regular expression matching queries in time proportional to p 2 per matched character. Some consequences of this new parameterization of automata are that PSPACE-hard problems such as NFA equivalence are FPT in p , and quadratic lower bounds for the regular expression matching problem do not hold for sufficiently small p . Having established that the co-lex width of an automaton is a fundamental complexity measure, we proceed by (i) determining its computational complexity and (ii) extending this notion from automata to regular languages by studying their smallest-width accepting NFAs and DFAs. In this work we focus on the deterministic case and prove that a canonical minimum-width DFA accepting a language ℒ–dubbed the Hasse automaton ℋ of ℒ–can be exhibited. ℋ provides, in a precise sense, the best possible way to (partially) order the states of any DFA accepting ℒ, as long as we want to maintain an operational link with the (co-lexicographic) order of ℒ’s prefixes. Finally, we explore the relationship between two conflicting objectives: minimizing the width and minimizing the number of states of a DFA. In this context, we provide an analogue of the Myhill-Nerode Theorem for co-lexicographically ordered regular languages.

Read full abstract

Consider an ordinal tree T on n nodes, each of which is assigned a category from an alphabet [σ]={1,2,…,σ}. We preprocess the tree T in order to support categorical path counting queries, which ask for the number of distinct categories occurring on the path in T between two query nodes x and y. For this problem, we propose a linear-space data structure with query time O(nlg⁡lg⁡σlg⁡w), where w=Ω(lg⁡n) is the word size in the word-RAM. As shown in our proof, from the assumption that matrix multiplication cannot be solved in time polynomially faster than cubic (with only combinatorial methods), our result is optimal, save for polylogarithmic speed-ups. For a trade-off parameter 1≤t≤n, we propose an O(n+n2t2)-word, O(tlg⁡lg⁡σlg⁡w) query time data structure. We also consider c-approximate categorical path counting queries, which must return an approximation to the number of distinct categories occurring on the query path, by counting each such category at least once and at most c times. We describe a linear-space data structure that supports 2-approximate categorical path counting queries in O(lg⁡n/lg⁡lg⁡n) time.Next, we generalize the categorical path counting queries to weighted trees. Here, a query specifies two nodes x,y and an orthogonal range Q. The answer to thus formed categorical path range counting query is the number of distinct categories occurring on the path from x to y, if only the nodes with weights falling inside Q are considered. We propose an O(nlg⁡lg⁡n+(n/t)4)-word data structure with O(tlg⁡lg⁡n) query time, or an O(n+(n/t)4)-word data structure with O(tlgϵ⁡n) query time. For an appropriate choice of the trade-off parameter t, this implies a linear-space data structure with O(n3/4lgϵ⁡n) query time. We then extend the approach to the trees weighted with vectors from [n]d, where d is a constant integer greater than or equal to 2. We present a data structure with O(nlgd−1+ϵ⁡n+(n/t)2d+2) words of space and O(tlgd−1⁡n(lg⁡lg⁡n)d−2) query time. For an O(n⋅polylogn)-space solution, one thus has O(n2d+12d+2⋅polylogn) query time.The inherent difficulty revealed by the lower bound we proved motivated us to consider data structures based on sketching. In unweighted trees, we propose a sketching data structure to solve the approximate categorical path counting problem which asks for a (1±ϵ)-approximation (i.e. within 1±ϵ of the true answer) of the number of distinct categories on the given path, with probability 1−δ, where 0<ϵ,δ<1 are constants. The data structure occupies O(n+ntlg⁡n) words of space, for the query time of O(tlg⁡n). For trees weighted with d-dimensional weight vectors (d≥1), we propose a data structure with O((n+ntlg⁡n)lgd⁡n) words of space and O(tlgd+1⁡n) query time.All these problems generalize the corresponding categorical range counting problems in Euclidean space Rd+1, for respective d, by replacing one of the dimensions with a tree topology.

Read full abstract

Linear Space Data Structure Research Articles

Related Topics

Articles published on Linear Space Data Structure

Terminal Embeddings in Sublinear Time

Predecessor on the Ultra-Wide Word RAM

String Indexing with Compressed Patterns

Co-lexicographically Ordering Automata and Regular Languages - Part I

Data structures for categorical path counting queries

Tree path majority data structures

A linear-space data structure for range-LCP queries in poly-logarithmic time

Ranked document selection

A Linear Space Data Structure for Range LCP Queries*

Towards an Optimal Method for Dynamic Planar Point Location

Constant-Time Tree Traversal and Subtree Equality Check for Grammar-Compressed Trees

Finding influential communities in massive networks

On the Most Likely Voronoi Diagram and Nearest Neighbor Searching

Document retrieval with one wildcard

Computing minimal and maximal suffixes of a substring

An Efficient Data Structure for Range Selection Queries

Space efficient data structures for dynamic orthogonal range counting

Efficient Data Structures for Range Selections Problem

Linear-Space Data Structures for Range Mode Query in Arrays

A compact routing scheme and approximate distance oracle for power-law graphs

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Linear Space Data Structure Research Articles

Related Topics

Articles published on Linear Space Data Structure

Terminal Embeddings in Sublinear Time

Predecessor on the Ultra-Wide Word RAM

String Indexing with Compressed Patterns

Co-lexicographically Ordering Automata and Regular Languages - Part I

Data structures for categorical path counting queries

Tree path majority data structures

A linear-space data structure for range-LCP queries in poly-logarithmic time

Ranked document selection

A Linear Space Data Structure for Range LCP Queries*

Towards an Optimal Method for Dynamic Planar Point Location

Constant-Time Tree Traversal and Subtree Equality Check for Grammar-Compressed Trees

Finding influential communities in massive networks

On the Most Likely Voronoi Diagram and Nearest Neighbor Searching

Document retrieval with one wildcard

Computing minimal and maximal suffixes of a substring

An Efficient Data Structure for Range Selection Queries

Space efficient data structures for dynamic orthogonal range counting

Efficient Data Structures for Range Selections Problem

Linear-Space Data Structures for Range Mode Query in Arrays

A compact routing scheme and approximate distance oracle for power-law graphs