Abstract

AbstractAs collections of archived digital documents continue to grow the maintenance of an archive, and the quality of reproduction from the archived format, become important long‐term considerations. In particular, Adobe's portable document format (PDF) is now an important ‘final form’ standard for archiving and distributing electronic versions of technical documents. It is important that all embedded images in the PDF, and any fonts used for text rendering, should at the very minimum be easily readable on screen. Unfortunately, because PDF is based on PostScript technology, it allows the embedding of bitmap fonts in Adobe Type 3 format as well as higher‐quality outline fonts in TrueType or Adobe Type 1 formats. Bitmap fonts do not generally perform well when they are scaled and rendered on low‐resolution devices such as workstation screens.The work described here investigates how a plug‐in to Adobe Acrobat enables bitmap fonts to be substituted by corresponding outline fonts using a checksum matching technique against a canonical set of bitmap fonts, as originally distributed. The target documents for our initial investigations are those PDF files produced by LATEX systems when set up in a default (bitmap font) configuration. For all bitmap fonts where recognition exceeds a certain confidence threshold replacement fonts in Adobe Type 1 (outline) format can be substituted with consequent improvements in file size, screen display quality and rendering speed. The accuracy of font recognition is discussed together with the prospects of extending these methods to bitmap‐font PDF files from sources other than LATEX. Copyright © 2003 John Wiley & Sons, Ltd.

Highlights

  • Over the past 10 years Adobe’s Portable Document Format (PDF) has become extremely popular as an archiving format, largely because of its PostScript-based architecture which allows complex material to be rendered at very high quality on page and on screen

  • A second important consideration is that PDF can give an accurate rendering of exactly what was published in hard-copy format with all layout, including page breaks, line breaks and so on, kept intact

  • Our ultimate goal is to enable font substitution in PDF files the two sections show that replacing bitmap fonts with outline fonts, in any PostScript-based file format, is far from straightforward; scaling factors need to be calculated for the replacement font which depend on the original resolution of the bitmap fonts and on the innately different character cell sizes of Type 1 and Type 3 character glyphs

Read more

Summary

INTRODUCTION

Over the past 10 years Adobe’s PDF has become extremely popular as an archiving format, largely because of its PostScript-based architecture which allows complex material to be rendered at very high quality on page and on screen. If a corpus of PDF documents is to be properly maintained in an ideal world, any upgrading of the PDF should be achieved by completely reprocessing the enhanced and amended source material To this end, the publisher should archive all the source files together with all the processing software needed to transform the source to the PDF. Many of the preservation schemes for maintaining digital resources, especially ones based on the Open Archival Information System (OAIS) model [1], are only just beginning to address the problems of maintaining the necessary hardware and software resources to enable accurate replication of archived material over a period of time For all these reasons maintenance and upgrade of an electronic document archive will often have to be on the final-form PDF only. The large amount of material available, and the fact that suitable replacement fonts could be identified, caused us to try out the feasibility of PDF font replacement on this (La)TEX-originated material

TEXand LATEX
Device independence and the dvips program
Reprocessing legacy material
EMERGE’s FixFont
The FixFont software
FixFont’s output
Font replacement in Acrobat
The FontRep plug-in for Acrobat
Other font replacement strategies
Problems encountered
Wider applicability
10 Conclusions
11 Acknowledgements
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call