Abstract

The Lister Hill National Center for Biomedical Communications, the National Library of Medicine's research division, is currently engaged in studying the application of Electronic Document Storage and Retrieval (EDSR) systems to a library environment. To accomplish this, an EDSR prototype has been built and is currently in use as a laboratory test-bed. The system consists of CCD scanners for document digitization, high resolution CRT document displays, hardcopy output devices, and optical and magnetic disk storage devices, all under the control of a PDP-11/44 computer. Prior to storage and transmission, the captured document images undergo processing operations that enhance their quality, eliminate degradations and remove redundancy. It is postulated that a pre-processing stage that removes extraneous material from the raw image data could improve the performance of the processing operations. The processing operation selected to prove this hypothesis is image compression, an important feature to economically extend on-line image storage capacity and increase image transfer speed in the EDSR system. The particular technique selected for implementation is one-dimensional runlength coding (CCITT recommendation T.4), because it is an established standard and appropriate as a base line system. The preprocessing operations on the raw image data are border removal and page centering. After centering the images, which are approximately 6 by 9 inches in the examples picked, in an 8.5 by 11 inch field, the noisy border areas are then made white. These operations are done electronically in a digital memory under operator control. For a selected set of pages, mostly comprising title pages and tables of contents, the result is an average improvement in compression ratios by a factor of over 3.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call