СТИСНЕННЯ ПРИРОДНОМОВНИХ ТЕКСТІВ РЕВЕРСНИМИ МУЛЬТИРОЗДІЛЬНИКОВИМИ КОДАМИ

A.V Anisimov,I.O Zavadskyi,T.S Chudakov

doi:10.34229/kca2522-9664.24.1.1

Abstract

We study a class of binary reverse multi-delimiter (RMD) data compression codes in application to natural language text compression. The RMD-codewords start with delimiters, i.e., prefixes of the form that cannot occur in other places of the codeword. The position of the delimiter in an RMD codeword differs from its position in “direct” multi-delimiter (MD) codes, where delimiters are codeword suffixes. RMD and MD codes possess many useful properties, such as unique decodability, completeness, universality, synchronizability, asymptotic densities, and finite automaton acceptability. For RMD-codes, we construct a monotonic mapping from the set of natural numbers to the set of codewords. For original MD-codes, hitherto, this was an open question. The discovered mapping and the byte quantification of a decoding automaton allow us to develop very fast byte-aligned algorithms for decoding and direct Boyer–Moore style search in compressed files. Compared with the known byte (SCDC) and Fibonacci codes, RMD codes demonstrate the best compression ratio on natural language texts (more than four times closer to the entropy bound than that of SCDC). Computer experiments demonstrate that RMD codes can be decoded almost as fast as SCDC and times faster than Fibonacci codes. In natural language text compression, we also practiced the RMD-encoding as a preprocessing tool, which improves the performance of the known modern powerful archivers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

СТИСНЕННЯ ПРИРОДНОМОВНИХ ТЕКСТІВ РЕВЕРСНИМИ МУЛЬТИРОЗДІЛЬНИКОВИМИ КОДАМИ

Abstract

Talk to us

Similar Papers

More From: Kibernetyka ta Systemnyi Analiz

Lead the way for us

Similar Papers

Comparisonal Analysis Of Even-Rodeh Algorithm Code And Fibonacci Code Algorithm For Text File Compression
Mhd Ali Subada
Journal Basic Science and Technology | VOL. 11
Mhd Ali SubadaMhd Ali Subada
28 Feb 2022
Journal Basic Science and Technology | VOL. 11

ВДОСКОНАЛЕННЯ МЕТОДУ КОМПРЕСІЇ ДАНИХ НА ОСНОВІ КОДУ ФІБОНАЧЧІ
Т В Миронюк ... А В Чепеленко
Вісник Черкаського державного технологічного університету. Серія: Технічні науки | VOL. 1
Т В Миронюк, et. al.Т В Миронюк ... А В Чепеленко
26 Nov 2018
Вісник Черкаського державного технологічного університету. Серія: Технічні науки | VOL. 1

Analisis Perbandingan Kompresi File Audio Menggunakan Algoritma Shannon Fano Dengan Algoritma Fibonacci Code
Sari Magdalena Simanjuntak
Jurnal Kajian Ilmiah Teknologi Informasi dan Komputer | VOL. 2
Sari Magdalena SimanjuntakSari Magdalena Simanjuntak
17 Jan 2024
Jurnal Kajian Ilmiah Teknologi Informasi dan Komputer | VOL. 2

Re-Ordered FEGC and Block Based FEGC for Inverted File Compression
V Glory ... S Domnic
International Journal of Information Retrieval Research | VOL. 3
V Glory, et. al.V Glory ... S Domnic
01 Jan 2013
International Journal of Information Retrieval Research | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

СТИСНЕННЯ ПРИРОДНОМОВНИХ ТЕКСТІВ РЕВЕРСНИМИ МУЛЬТИРОЗДІЛЬНИКОВИМИ КОДАМИ

Abstract

Talk to us

Similar Papers

More From: Kibernetyka ta Systemnyi Analiz