A new text compression technique based on language structure

K Ibrahim Akman

doi:10.1177/016555159502100203

A new text compression technique based on language structure

K Ibrahim Akman

https://doi.org/10.1177/016555159502100203

Copy DOI

Journal: Journal of Information Science	Publication Date: Apr 1, 1995
Citations: 5

Affiliation: Middle East Technical University

#Character Array #Shorter Bit + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

This paper describes a new data compression technique which utilises some of the common structural characteris tics of languages. The proposed algorithm is designed to partition a word into its root and suffix(es), which are then replaced by shorter bit representations. The method uses three dictionaries in the form of binary search trees and one character array. The first two dictionaries are for roots, whereas the third one is for suffixes. The character array is used for both searching compressible words and coding incompressible words. The number of bits in representing a substring depends on the number of the entries in the dictionary in which the substring is found. The proposed algorithm is implemented in the Turkish language and tested using three different text groups with different lengths. The results indicate a compression of up to 47%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Journal of Information Science

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.