Abstract

Let S be a string over a finite, ordered alphabet Σ. For any substring S′ of S, the set of distinct characters contained in S′ is called its fingerprint. The text fingerprinting problem consists of constructing a data structure for the string S in advance, so that on given any input set C ⊆ Σ of characters, we can answer the following queries efficiently: (1) determine if C represents a fingerprint of some substrings in S; (2) find all maximal substrings of S whose fingerprint is equal to C. The best results known so far solved these two queries in Θ(|Σ|) and Θ(|Σ|+K) time, respectively, where K is the number of maximal substrings. In this paper, we propose a new data structure that improves the time complexities of the two queries to O(|C| log(|Σ|/|C|)) and O(|C| log(|Σ|/|C|) + K) time, respectively, where the term |C| log(|Σ|/|C|) is always bounded by Θ(|Σ|). This result answers the open problem proposed by Amir et al. [A. Amir, A. Apostolico, G.M. Landau, G. Satta, Efficient text fingerprinting via Parikh mapping, J. Discrete Algorithms 1 (2003) 409- 421]. In addition, our data structure uses less storage than the existing solutions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.