Abstract

In this work, we introduce an efficient method for lossy compression of digitalized documents. The method uses a dictionary which consists of class representatives defined using a minimum entropy criterion. The algorithm initially identifies the different symbols contained in a document image, and then the symbols are grouped in classes by means of a hierarchic clustering algorithm. For each class, a representative is selected using the principle of minimum entropy and suitable similarity distances. The technique creates a file in which every object belonging to a class is replaced by its class representative. Finally, the resulting file is compressed. The performance of the proposed algorithm is assessed using digitized files from a standard database for document compression along with different resolutions. Comparisons against other state-of-the-art algorithms are performed in this manuscript. The results establish quantitatively that the present methodology is a more efficient technique.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call