Word-Based Fixed and Flexible List Compression

Ebru Celikel,Mehmet E Dalkilic,Gokhan Dalkilic

doi:10.1007/11569596_80

Abstract

We present a dictionary based lossless text compression scheme where we keep frequent words in separate lists (list_n contains words of length n). We pursued two alternatives in terms of the lengths of the lists. In the fixed approach all lists have equal number of words whereas in the flexible approach no such constraint is imposed. Results clearly show that the flexible scheme is much better in all test cases possibly due to the fact that it can accomodate short, medium or long word lists reflecting on the word length distributions of a particular language. Our approach encodes a word as a prefix (the length of the word) and the body of the word (as an index in the corresponding list). For prefix encoding we have employed both a static encoding and a dynamic encoding (Huffman) using the word length statistics of the source language. Dynamic prefix encoding clearly outperformed its static counterpart in all cases. A language with a higher average word length can, theoretically, benefit more from a word-list based compression approach as compared to one with a lower average word length. We have put this hypothesis to test using Turkish and English languages with average word lengths of 6.1 and 4.4, respectively. Our results strongly support the validity of this hypothesis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Word-Based Fixed and Flexible List Compression

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Coh-Metrix 시스템을 활용한 고등학교 영어 모의고사 읽기 지문의 학년 간 연계성 분석
Jisu Ryu ... Moongee Jeon
The English Teachers Association in Korea | VOL. 29
Jisu Ryu, et. al.Jisu Ryu ... Moongee Jeon
30 Jun 2023
The English Teachers Association in Korea | VOL. 29

Improving reading rate prediction with word length information: Evidence from Dutch.
Marc Brysbaert ... Longjiao Sui
Quarterly Journal of Experimental Psychology | VOL. 74
Marc Brysbaert, et. al.Marc Brysbaert ... Longjiao Sui
12 May 2021
Quarterly Journal of Experimental Psychology | VOL. 74

AVERAGE WORD LENGTH AND TEXT REDUNDANCY VARIABILITY: FRENCH TEXTS CASE STUDY
Malvina Marinashvili
Polonia University Scientific Journal | VOL. 38
Malvina MarinashviliMalvina Marinashvili
01 Jan 2020
Polonia University Scientific Journal | VOL. 38

Average Word Length from the Diachronic Perspective: The Case of Arabic
Jiří Milička
Linguistic Frontiers | VOL. 1
Jiří MiličkaJiří Milička
01 Dec 2018
Linguistic Frontiers | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Word-Based Fixed and Flexible List Compression

Abstract

Talk to us

Similar Papers