Abstract

In the context of long-term archival of digital assets, file formats that are standardized and designed for longevity such as PDF/A are preferred. However, due to the complexity of and ambiguities in PDF standards, it is far from trivial to either create standard-conformant files or check the conformance of any given file. This study investigates the challenges when checking real-world PDF files from public sector organizations meant for long-term archival for PDF/A conformance. Results show that only a small set of PDF files claims to conform to the PDF/A-1b specification variant and even fewer files pass conformance checks by various conformance checking tools. Challenges for conformance checking tools include both ambiguities in the standards’ technical specifications and limitations in the implementation.

Highlights

  • The process of long-term maintenance of digital assets for use and re-use imposes a number of challenges, including the limitations of storage technologies and the choice of future-proof file formats

  • This study investigates the following research questions related to the long-term archival of PDF/A files by public sector organizations: RQ 1: What characterizes PDF files provided by public sector organizations? RQ 2: How successful are public sector organizations at providing PDF/A-1b-conformant files? RQ 3: How and why does the outcome of assessments of PDF/A-1b conformance for files differ between conformance checking tools?

  • Concerning the first question, it was found that doctoral dissertations published in PDF format, in their majority do not claim to adhere to any PDF/A standard despite the expectation that those files are meant for long-term archival

Read more

Summary

Introduction

The process of long-term maintenance of digital assets for use and re-use imposes a number of challenges, including the limitations of storage technologies and the choice of future-proof file formats. In context of the latter challenge, digital archives, for example, must be able to handle a number of different media formats such as audio or video recordings or textual documents. The study discusses a number of reasons why such incompatibilities arise: prominent causes include comprehensiveness (standard is too ‘big’), number of choices (standard may allow competing/contradictory alternatives), ambiguities in terminology (standard’s textual representation is hard to interpret), and feature overload (standard includes functionality not relevant for many users, not included in many implementations for economic reasons). Standard implementation is further hindered by the required compatibility to ‘buggy’ implementations predating the standard or the omission of information necessary to understand the standard’s specification

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call