We give new solutions to the Searchable Partial Sums with Indels problem. Given a sequence of n k -bit numbers, we present a structure taking kn + o ( kn ) bits of space, able of performing operations sum , search , insert , and delete , all in O (log n ) worst-case time, for any k = O (log n ). This extends previous results by Hon et al. [2003c] achieving the same space and O (log n /log log n ) time complexities for the queries, yet offering complexities for insert and delete that are amortized and worse than ours, and supported only for k = O (1). Our result matches an existing lower bound for large values of k . We also give new solutions to the Dynamic Sequence problem. Given a sequence of n symbols in the range [1,σ] with binary zero-order entropy H 0 , we present a dynamic data structure that requires nH 0 + o ( n log σ) bits of space, which is able of performing rank and select , as well as inserting and deleting symbols at arbitrary positions, in O (log n log σ) time. Our result is the first entropy-bound dynamic data structure for rank and select over general sequences. In the case σ = 2, where both previous problems coincide, we improve the dynamic solution of Hon et al. [2003c] in that we compress the sequence. The only previous result with entropy-bound space for dynamic binary sequences is by Blandford and Blelloch [2004], which has the same complexities as our structure, but does not achieve constant 1 multiplying the entropy term in the space complexity. Finally, we present a new dynamic compressed full-text self-index, for a collection of texts over an alphabet of size σ, of overall length n and h th order empirical entropy H h . The index requires nH h + o ( n log σ) bits of space, for any h ≤ α log sigma n and constant 0 < α < 1. It can count the number of occurrences of a pattern of length m in time O ( m log n log σ). Each such occurrence can be reported in O (log 2 n log log n ) time, and displaying a context of length ℓ from a text takes time O (log n (ℓ log σ + log n log log n )). Insertion/deletion of a text to/from the collection takes O (log n log σ) time per symbol. This significantly improves the space of a previous result by Chan et al. [2004] in exchange for a slight time complexity penalty. We achieve at the same time the first dynamic index requiring essentially nH h bits of space, and the first construction of a compressed full-text self-index within that working space. Previous results achieve at best O ( nH h space with constants larger than 1 [Ferragina and Manzini 2000; Arroyuelo and Navarro 2005] and higher time complexities. An important result we prove in this paper is that the wavelet tree of the Burrows-Wheeler transform of a text, if compressed with a technique that achieves zero-order compression locally (e.g., Raman et al. [2002]), automatically achieves h th order entropy space for any h . This unforeseen relation is essential for the results of the previous paragraph, but it also derives into significant simplifications on many existing static compressed full-text self-indexes that build on wavelet trees.
Read full abstract