Abstract

We extend recent results regarding finding shortest unique substrings (SUSs) to obtain new time-space tradeoffs for this problem and the generalization of finding k-mismatch SUSs. Our new results include the first algorithm for finding a k-mismatch SUS in sublinear space, which we obtain by extending an algorithm by Senanayaka (2019) and combining it with a result on sketching by Gawrychowski and Starikovskaya (2019). We first describe how, given a text T of length n and m words of workspace, with high probability we can find an SUS of length L in O(n(L/m)logL) time using random access to T, or in O(n(L/m)log2(L)loglogσ) time using O((L/m)log2L) sequential passes over T. We then describe how, for constant k, with high probability, we can find a k-mismatch SUS in O(n1+ϵL/m) time using O(nϵL/m) sequential passes over T, again using only m words of workspace. Finally, we also describe a deterministic algorithm that takes O(nτlogσlogn) time to find an SUS using O(n/τ) words of workspace, where τ is a parameter.

Highlights

  • A shortest unique substring (SUS) of a given text T [1..n] is a substring containing a given positionT [q] and occurring only once in T, such that every shorter substring containing T [q] occurs at least twice in T

  • We are only interested in position-SUSs and k-mismatch SUSs, which we describe shortly

  • Monte-Carlo randomized sketching algorithm that takes d patterns of maximum length, scans T one character at a time, and, for each position, reports the longest pattern with an occurrence ending at that position with probability of failure inversely proportional to any fixed polynomial in n

Read more

Summary

Introduction

A shortest unique substring (SUS) of a given text T [1..n] is a substring containing a given position. Hon et al gave an O(n2 )-time, O(n)-space construction of an O(n)-space data structure, which, given q, in O(1) time returns the endpoints of a k-mismatch SUS for T [q]. All of the data structures return an SUSs given q in O(1) time and have the same final space as construction space, except that the construction-space bounds for Ganguly et al.’s third and fourth results are O(n/τ ) words plus 4n + o (n) bits of space, and O(n/τ + n/ logc n) words plus.

Tradeoffs with Karp-Rabin Pattern Matching
Tradeoffs with Sketching
A Deterministic Algorithm
Finding Occurrences of Suffixes of X
Finding Occurrences of Y
Finding Occurrences of Prefixes of Z
Putting the Occurrences Together
Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.