Optimizing alphabet using genetic algorithms

Jan Platos,Pavel Kromer

doi:10.1109/isda.2011.6121705

Jan Platos, Pavel Kromer

https://doi.org/10.1109/isda.2011.6121705

Copy DOI

Export

Save

Cite

Publication Date: Nov 1, 2011

Citations: 5

Affiliation: Technical University of Ostrava

Abstract
Full-Text
Similar Papers

Abstract

Listen

Data compression algorithms were usually designed for data processing symbol by symbol. The input symbols of these algorithms are usually taken from the ASCII table, i.e. the size of the input alphabet is 256 symbols which are representable by 8-bit numbers. Several other techniques were developed-syllable-based compression, which uses the syllable as a basic compression symbol, and word-based compression, which uses words as basic symbols. These three approaches are strictly bounded and no overlap is allowed. This may be a problem because it may be helpful to have an overlap between them and use a character-based approach with a few symbols as a sequence of characters. This paper describes an algorithm that looks for the optimal alphabet for different text files. The alphabet may contain characters and 2-grams.

Full Text