Space–time trade-offs for finding shortest unique substrings and maximal unique matches

Arnab Ganguly,Wing-Kai Hon,Rahul Shah,Sharma V Thankachan

doi:10.1016/j.tcs.2017.08.002

Abstract

Given a string X[1,n] and a position k∈[1,n], a Shortest Unique Substring of X covering k, denoted by Sk, is a substring X[i,j] of X which satisfies the following conditions: (i)i≤k≤j, (ii)i is the only position where there is an occurrence of X[i,j], and (iii)j−i is minimized. Current best-known algorithms for finding Sk require Θ(n) words of working space, and O(n) time. Let τ be a given parameter. We present the following new results.•Given a k∈[1,n], we can compute Sk in O(nτ2log⁡nτ) time using X and an additional O(n/τ) words of working space.•For every k∈[1,n], we can compute Sk in O(nτ2log⁡n) time using X, and an additional O(n/τ) words and 4n+o(n) bits of working space.•We present an O(nτlogc+1⁡n)-time randomized algorithm that uses n/logc⁡n words in addition to that mentioned above, where c≥0 is an arbitrary constant. In this case, the reported string is unique and covers k, but, with probability at most n−O(1), may not be the shortest.By choosing τ=ω(1), our results imply the first sub-linear space (in addition to the input string) solution to these problems. We also present the following two results.•An algorithm that finds Sk for every k∈[1,n] using O(nlog⁡σ) bits of working space in O(nlog⁡n) time, where σ≤n is the number of distinct symbols in X.•A 4n+o(n)-bit index that can report Sk for any k in O(1) time. As a consequence of our techniques, we also obtain similar space-and-time tradeoffs for a related problem of finding Maximal Unique Matches of two strings (Delcher et al., 1999 [14]).

Full Text