Abstract

In this paper, we tackle the problem of categorizing and identifying cross-depicted historical motifs using recent deep learning techniques, with aim of developing a content-based image retrieval system. As cross-depiction, we understand the problem that the same object can be represented (depicted) in various ways. The objects of interest in this research are watermarks, which are crucial for dating manuscripts. For watermarks, cross-depiction arises due to two reasons: (i) there are many similar representations of the same motif, and (ii) there are several ways of capturing the watermarks, i.e., as the watermarks are not visible on a scan or photograph, the watermarks are typically retrieved via hand tracing, rubbing, or special photographic techniques. This leads to different representations of the same (or similar) objects, making it hard for pattern recognition methods to recognize the watermarks. While this is a simple problem for human experts, computer vision techniques have problems generalizing from the various depiction possibilities. In this paper, we present a study where we use deep neural networks for categorization of watermarks with varying levels of detail. The macro-averaged F1-score on an imbalanced 12 category classification task is %, the multi-labelling performance (Jaccard Index) on a 622 label task is %. To analyze the usefulness of an image-based system for assisting humanities scholars in cataloguing manuscripts, we also measure the performance of similarity matching on expert-crafted test sets of varying sizes (50 and 1000 watermark samples). A significant outcome is that all relevant results belonging to the same super-class are found by our system (Mean Average Precision of 100%), despite the cross-depicted nature of the motifs. This result has not been achieved in the literature so far.

Highlights

  • The identification and retrieval of historical watermarks has been an important research field for codicology and paper history for a long time [1]

  • We report the results in terms of mean Average Precision, which is a well established metric in the context of similarity matching

  • We showed that deep learning-based approaches can be very useful for classifying and analyzing watermarks in historical documents, despite the fact that the watermarks are depicted in various manners

Read more

Summary

Introduction

The identification and retrieval of historical watermarks has been an important research field for codicology and paper history for a long time [1]. The main use of watermark identification is dating of historical papers, for example when cataloguing non-dated medieval manuscripts [2]. There are broader research questions addressed by watermark identification [3,4], e.g., economical history research. Watermarks are created during the process of handmade paper-making from tissue rags, as was done in Europe from the Middle Ages (13th century) till the mid-19th century [1]. The paper making process involved plunging a mould into liquid tissue pulp.

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.