FINDING CHARACTERISTIC SUBSTRINGS FROM COMPRESSED TEXTS

Shunsuke Inenaga,Hideo Bannai

doi:10.1142/s0129054112400126

FINDING CHARACTERISTIC SUBSTRINGS FROM COMPRESSED TEXTS

Shunsuke Inenaga, Hideo Bannai

https://doi.org/10.1142/s0129054112400126

Copy DOI

Journal: International Journal of Foundations of Computer Science	Publication Date: Feb 1, 2012
Citations: 25

Affiliation: Kyushu University

#Straight Line Program #Compressed String + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Text mining from large scaled data is of great importance in computer science. In this paper, we consider fundamental problems on text mining from compressed strings, i.e., computing a longest repeating substring, longest non-overlapping repeating substring, most frequent substring, and most frequent non-overlapping substring from a given compressed string. Also, we tackle the following novel problem: given a compressed text and compressed pattern, compute the representative of the equivalence class of the pattern w.r.t. the text. We present algorithms that solve the above problems in time polynomial in the size of input compressed strings. The compression scheme we consider is straight line program (SLP) which has exponential compression, and therefore our algorithms are more efficient than any existing algorithms that require decompression of given SLPs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: International Journal of Foundations of Computer Science

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.