Abstract

Digital Libraries have been developed nowadays as a way to dispose digital information through the Internet. This is particularly very useful when the information comes from historical documents. This research takes place in the PROHIST Project [Mello et al., 2008] which aims the creation of a digital library with methods to preserve and broadcast images of historical documents. In general, the access to original documents has to be done carefully as, because of its age, the paper is more susceptible to the wear and tear over time. In order to make the documents more easily accessible, digitization comes as the most efficient solution. In a digital media, as digital images, the documents can be visualized and copied. This also helps the preservation of the documents as they are digitized in high resolution and in true color format. It is common to use JPEG file format (Sayood, 1996) to store these images ensuring a good space storage/quality ratio. However, even in this format, to access an archive of thousands of high quality true color images is not an easy task even with the extended use of broad band Internet. The storage space of the images can be reduced with its conversion to black-and-white images. In this bi-level format and stored using GIF file format, the size of the file can be five times lower than the original true color JPEG image. Binarization or thresholding (Parker, 1997) is the process that converts an image into black-and-white: a threshold value is defined and the colors above that value are converted into white, while the colors below it are converted into black. This is a very simple process in digital image processing when one has a document with black ink written on a white paper. Historical documents, however, have several types of noises. The degradation yellows the sheet of paper and creates some noise that is perceptible to the digitizing process. Even more, in some cases, the ink has faded. This is particularly important when the document is written on both sides of the paper. In some cases, the ink of one side interferes in the other creating an effect called “ink bleeding”. Because of these problems, it is very difficult to find the best threshold value that separates the colors that belong to the paper from the colors that belong to the ink. An example of such a document is presented in Figure 1-left. In this paper, we present a new thresholding algorithm for color quantization based on genetic algorithms and image fidelity metrics. These metrics are used to define the convergence point of the genetic algorithm. The quantized image is then binarized based on

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.