Abstract

The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property \(\varPi \), i.e. a set of \(\varPi \)-valid intervals over x. A suffix-tree-like index over these valid prefixes of suffixes of x can be built in time and space \(\mathcal {O}(n)\). We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space \(\mathcal {O}(n)\). We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold \(\frac{1}{z}\), we say that a string p of length m matches a weighted sequence X of length n at starting position i if the product of probabilities of the letters of p at positions \(i,\ldots ,i+m-1\) in X is at least \(\frac{1}{z}\). Our algorithm for building the PSA can be directly applied to build an \(\mathcal {O}(nz)\)-sized suffix-array-like index over X in time and space \(\mathcal {O}(nz)\).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.