New adaptive compressors for natural language text

N R Brisaboa,G Navarro,J R Parama,A Fariña

doi:10.1002/spe.882

Abstract

AbstractSemistatic byte‐oriented word‐based compression codes have been shown to be an attractive alternative to compress natural language text databases, because of the combination of speed, effectiveness, and direct searchability they offer. In particular, our recently proposed family of dense compression codes has been shown to be superior to the more traditional byte‐oriented word‐based Huffman codes in most aspects. In this paper, we focus on the problem of transmitting texts among peers that do not share the vocabulary. This is the typical scenario for adaptive compression methods. We design adaptive variants of our semistatic dense codes, showing that they are much simpler and faster than dynamic Huffman codes and reach almost the same compression effectiveness. We show that our variants have a very compelling trade‐off between compression/decompression speed, compression ratio, and search speed compared with most of the state‐of‐the‐art general compressors. Copyright © 2008 John Wiley & Sons, Ltd.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

New adaptive compressors for natural language text

Abstract

Talk to us

Similar Papers

More From: Software: Practice and Experience

Lead the way for us

Journal: Software: Practice and Experience	Publication Date: Apr 1, 2008
Citations: 10

Similar Papers

Lightweight natural language text compression
Nieves R Brisaboa ... Gonzalo Navarro
Information Retrieval | VOL. 10
Nieves R Brisaboa, et. al.Nieves R Brisaboa ... Gonzalo Navarro
09 Sep 2006
Information Retrieval | VOL. 10

Natural Language Compression per Blocks
Petr Proch´Zka ... Jan Holub
-
Petr Proch´Zka, et. al.Petr Proch´Zka ... Jan Holub
01 Jun 2011
01 Jun 2011

Computations of high-lift airfoil flows using two-equation turbulence models
Chang Kim ... Oh Rho
-
Chang Kim, et. al.Chang Kim ... Oh Rho
11 Jan 1999
11 Jan 1999

Huffman-based code compression techniques for embedded processors
Talal Bonny ... Jörg Henkel
ACM Transactions on Design Automation of Electronic Systems | VOL. 15
Talal Bonny, et. al.Talal Bonny ... Jörg Henkel
01 Sep 2010
ACM Transactions on Design Automation of Electronic Systems | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

New adaptive compressors for natural language text

Abstract

Talk to us

Similar Papers

More From: Software: Practice and Experience