Abstract

Efficient bit-representation or compression of documents is an important issue in many applications. The amount of compression depends on the document contents such as written scripts, diagrams, tables, etc. The contents of the document determine the limit of this compression. In the CCITT Recommendation T.4, ‘Standardization of group 3 apparatus for document transmission’, a modified Huffman code was chosen as the standard compression technique [1]. The selection is based on examining documents with contents of different natures. With the cursive nature and the domination of certain shapes in printed Arabic, one may be curious to know the compression efficiency of the chosen standard for documents with printed Arabic contents. For this purpose, more than ten documents containing printed Arabic script have been scanned and analyzed in this paper. Both the entropy, based on the Capon model [5], and the compression rates using the modified Huffman code are calculated. Our results show that the CCITT coding standard seems to be robust for documents with printed Arabic script.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.