Faster query algorithms for the text fingerprinting problem

Chi-Yuan Chan,Hung-I Yu,Wing-Kai Hon,Biing-Feng Wang

doi:10.1016/j.ic.2011.04.001

Chi-Yuan Chan, Hung-I Yu + Show 2 more

https://doi.org/10.1016/j.ic.2011.04.001

Copy DOI

Journal: Information and Computation	Publication Date: Apr 7, 2011
Citations: 4	License type: elsevier-specific: oa user license

Affiliation: National Tsing Hua University

Abstract

Let S be a string over a finite, ordered alphabet Σ. For any substring S ′ of S, the set of distinct characters contained in S ′ is called its fingerprint. The text fingerprinting indexing problem is to construct a data structure for the string S in advance, so that on given any input subset C of Σ, we can answer the following queries efficiently: (1) determine if C represents a fingerprint of some substrings in S; (2) find all maximal substrings of S whose fingerprint is C. The best known results solved these two queries in Θ ( | Σ | ) and Θ ( | Σ | + K ) time, respectively, where K is the number of maximal substrings. In this paper, we propose two improved algorithms for the text fingerprinting indexing problem. The first one solves the two queries in O ( | C | log n ) and O ( | C | log n + K ) time, respectively. For the second one, the query time complexities are further reduced to O ( | C | log ( | Σ | / | C | ) ) and O ( | C | log ( | Σ | / | C | ) + K ) . Both results answer an open problem proposed by Amir et al.

Full Text