Abstract

The popularity of printing devices has multiplied the diffusion of printed documents, raising concerns regarding the security and integrity of their content. The same device that prints reliable contracts, newspapers, and others, can also be used for malicious purposes, such as printing fake money, forging fake contracts, and produce illegal packaging, thus calling for the development of image forensics techniques to pinpoint criminal printed materials and trace back to their origin. Despite some recent advances, previous works model such a problem as a big data-focused closed-set classification problem. In this work, we address the source linking problem of printed color documents by treating it as a verification problem. Specifically, we aim at deciding if two documents have been printed by the same printer or not. To achieve this goal, and to cope with the data scarcity deriving from the difficulty of gathering massive amounts of printed and scanned documents, we propose to use an ensemble of Siamese Neural Networks, with unique architectures expressly designed to work with a small training dataset. As a further unique feature, the proposed approach is suited to work in an open set scenario, where the printers used to produce the documents analyzed at the test time are not included in the training set. Results obtained under both open and closed set conditions, with a thorough comparison with available baseline methods, showed classification performance higher than 97% in the closed set scenario and higher than 86% in the open set case, highlighting the practicality of such approaches in real-world scenarios.

Highlights

  • Despite the tremendous efforts to replace the number of printed documents with their digital counterparts, printed documents are still common and can be found everywhere

  • In order to develop a printed document verification system, expressly thought to cope with the data scarcity problem typical of such a scenario, we propose to use an ensemble of Siamese Neural Networks (SNNs) with novel architectures characterized by shallow-but-wide topologies

  • Siamese Networks have the following advantages over common Convolutional Neural Networks (CNNs) for our specific problem: (i) they learn to find similarities or dissimilarities between different documents according to the printing sources, learning to detect if the patterns introduced during the printing process originate from the same printer or not; (ii) Siamese Neural Networks are known for their one-shot learning capabilities [56]– [58], not requiring many samples for successful learning; and (iii) since they learn to match - or unmatch - similar pairs no matter their class labels, they can be used in an open set modality to classify samples produced by printers that do not belong to the training set

Read more

Summary

Introduction

Despite the tremendous efforts to replace the number of printed documents with their digital counterparts, printed documents are still common and can be found everywhere. Generating printed documents is very cheap, with printing devices being more accessible and easy to use than ever. This accessibility and ease of use plays a major role in the production of vast amounts of printed documents, as we see every day. Books to newspapers, magazines to contracts, and product packaging, there is always a printing technology involved. Notwithstanding the wide diffusion of printed information, the lack of regulation and forensic procedures on printed documents allow counterfeiters and other criminals to use printing technology for malicious purposes. The analysis of printed documents related to other criminal activities like corruption or terrorism may help to trace back to crime perpetrators

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call