A Faster Query Algorithm for the Text Fingerprinting Problem

Chi-Yuan Chan,Wing-Kai Hon,Hung-I Yu,Biing-Feng Wang

doi:10.1007/978-3-540-75520-3_13

Abstract

Let S be a string over a finite, ordered alphabet Σ. For any substring S′ of S, the set of distinct characters contained in S′ is called its fingerprint. The text fingerprinting problem consists of constructing a data structure for the string S in advance, so that on given any input set C ⊆ Σ of characters, we can answer the following queries efficiently: (1) determine if C represents a fingerprint of some substrings in S; (2) find all maximal substrings of S whose fingerprint is equal to C. The best results known so far solved these two queries in Θ(|Σ|) and Θ(|Σ|+K) time, respectively, where K is the number of maximal substrings. In this paper, we propose a new data structure that improves the time complexities of the two queries to O(|C| log(|Σ|/|C|)) and O(|C| log(|Σ|/|C|) + K) time, respectively, where the term |C| log(|Σ|/|C|) is always bounded by Θ(|Σ|). This result answers the open problem proposed by Amir et al. [A. Amir, A. Apostolico, G.M. Landau, G. Satta, Efficient text fingerprinting via Parikh mapping, J. Discrete Algorithms 1 (2003) 409- 421]. In addition, our data structure uses less storage than the existing solutions.

Full Text