Linear Space Representation Research Articles

Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index f, called the finger, and the query index i. We consider both a static variant, where we first place a finger and subsequently access indices near the finger efficiently, and a dynamic variant where also moving the finger such that the time depends on the distance moved is supported. Let n be the size the grammar, and let N be the size of the string. For the static variant we give a linear space representation that supports placing the finger in O(log N) time and subsequently accessing in O(log D) time, where D is the distance between the finger and the accessed index. For the dynamic variant we give a linear space representation that supports placing the finger in O(log N) time and accessing and moving the finger in O(log D + log log N) time. Compared to the best linear space solution to random access, we improve a O(log N) query bound to O(log D) for the static variant and to O(log D + log log N) for the dynamic variant, while maintaining linear space. As an application of our results we obtain an improved solution to the longest common extension problem in grammar compressed strings. To obtain our results, we introduce several new techniques of independent interest, including a novel van Emde Boas style decomposition of grammars.

Read full abstract

AbstractThe number of parameters necessary for the word N‐gram model is equal to the n‐th power of the size of the vocabulary. As a result, compression of the parameter space is vital, depending on the field in question. In this research, singular value decomposition (SVD) of an N‐pair word co‐occurrence matrix is performed. The word and phrase state are taken to be vectors in a K‐dimensional space. The authors then attempt to compress the N‐gram probability parameter space using an approximation of the original matrix but with a lower number of dimensions. The results clearly show that in vector space, the Trigram model can be represented using roughly 17.5% fewer parameters. In addition, clustering is performed based on the distance in the defined space, and whether or not words are positioned appropriately in the linear space is investigated. These results confirm through a comparison using the same number of parameters that the entropy value is lower compared to the class model obtained using a method based on the maximization of the amount of mutual information, and that the positioning is good. © 2003 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 86(8): 61–70, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.10106

Read full abstract

Linear Space Representation Research Articles

Related Topics

Articles published on Linear Space Representation

Finger Search in Grammar-Compressed Strings

Computing Minimum Cycle Bases in Weighted Partial 2-Trees in Linear Time

Interference of probabilities and number field structure of quantum models

Interference of probabilities and number field structure of quantum models

A linear space representation of language probability through SVD of N‐gram matrix

Planar stage graphs: Characterizations and applications

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Linear Space Representation Research Articles

Related Topics

Articles published on Linear Space Representation

Finger Search in Grammar-Compressed Strings

Computing Minimum Cycle Bases in Weighted Partial 2-Trees in Linear Time

Interference of probabilities and number field structure of quantum models

Interference of probabilities and number field structure of quantum models

A linear space representation of language probability through SVD of N‐gram matrix

Planar stage graphs: Characterizations and applications