Abstract

Electronic documents are widely used to store and share information such as bank statements, contracts, articles, maps and tax information. Many different applications exist for displaying a given electronic document, and users rightfully assume that documents will be rendered similarly independently of the application used. However, this is not always the case, and these inconsistencies, regardless of their causes—bugs in the application or the file itself—can become critical sources of miscommunication. In this paper, we present a study on the correctness of PDF documents and readers. We start by manually investigating a large number of real-world PDF documents to understand the frequency and characteristics of cross-reader inconsistencies, and find that such inconsistencies are common—13.5% PDF files are inconsistently rendered by at least one popular reader. We then propose an approach to detect and localize the source of such inconsistencies automatically. We evaluate our automatic approach on a large corpus of over 230 K documents using 11 popular readers and our experiments have detected 30 unique bugs in these readers and files. We also reported 33 bugs, some of which have already been confirmed or fixed by developers.

Highlights

  • Many different applications exist for displaying a given type of electronic document, and inconsistencies between these applications can be critical sources of miscommunication

  • The Chrome and Mozilla support forums contain hundreds of complaints from users about Portable Document Format (PDF) files being displayed differently across readers. These issues include drug information sent to doctors that cannot be properly displayed or opened (Google Chrome Help Forum 2015), customers unable to read their online bills (Mozilla Support Forum 2013), and web designers worrying that customers cannot correctly display the PDF files on their websites (Chromium Bug Tracker 2016)

  • While font embedding is not included in the PDF specifications, it is included in many publisher standards

Read more

Summary

Introduction

Many different applications exist for displaying a given type of electronic document, and inconsistencies between these applications can be critical sources of miscommunication. It is crucial to display an electronic file consistently across different file readers. There are still many inconsistencies among PDF file readers. The Chrome and Mozilla support forums contain hundreds of complaints from users about PDF files being displayed differently across readers. These issues include drug information sent to doctors that cannot be properly displayed or opened (Google Chrome Help Forum 2015), customers unable to read their online bills (Mozilla Support Forum 2013), and web designers worrying that customers cannot correctly display the PDF files on their websites (Chromium Bug Tracker 2016). There are two main causes of inconsistencies between PDF readers.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call