Abstract

This work focuses on document fragments association using deep metric learning methods. More precisely, we are interested in ancient papyri fragments that need to be reconstructed prior to their analysis by papyrologists. This is a challenging task to automatize using machine learning algorithms because labeled data is rare, often incomplete, imbalanced and of inconsistent conservation states. However, there is a real need for such software in the papyrology community as the process of reconstructing the papyri by hand is extremely time-consuming and tedious. In this paper, we explore ways in which papyrologists can obtain useful matching suggestion on new data using Deep Convolutional Siamese-Networks. We emphasize on low-to-no human intervention for annotating images. We show that the from-scratch self-supervised approach we propose is more effective than using knowledge transfer from a large dataset, the former achieving a top-1 accuracy score of 0.73 on a retrieval task involving 800 fragments.

Highlights

  • Introduction and contextThrough the study of ancient documents, archaeologists want to understand how ancient human societies were organized

  • This approach still implies that some annotations are available on the target dataset to perform the fine-tuning, and since we want as little human annotation work as possible, we explore the idea of self-supervised learning

  • We propose self-supervised Deep Metric Learning method able to provide useful suggestions of fragment association to papyrologists, that does not need any manual annotation work. – We evaluate the proposition on two datasets, with two convolutional neural networks architectures – We compare our self-supervised approach with a domain adaptation approach – We provide insight on how this could be useful for papyrologists in a realistic use case

Read more

Summary

Introduction and context

Through the study of ancient documents, archaeologists want to understand how ancient human societies were organized. More precisely domain adaptation, the information learned on this dataset can be used on another dataset with insufficient training data on its own This approach still implies that some annotations are available on the target dataset to perform the fine-tuning, and since we want as little human annotation work as possible, we explore the idea of self-supervised learning. The paper is organized as follows : In section 2, we give an overview of existing works that use DML, transfer learning and self-supervised ap-. We provide a challenging and sizeable papyrus fragments dataset containing 4579 fragments constituting 1118 papyri, tailored for fragment retrieval tasks It is based on the University of Michigan Papyrus Collection, pre-processed and ready to use. 2. We propose self-supervised Deep Metric Learning method able to provide useful suggestions of fragment association to papyrologists, that does not need any manual annotation work. We propose self-supervised Deep Metric Learning method able to provide useful suggestions of fragment association to papyrologists, that does not need any manual annotation work. – We evaluate the proposition on two datasets, with two convolutional neural networks architectures – We compare our self-supervised approach with a domain adaptation approach – We provide insight on how this could be useful for papyrologists in a realistic use case

Related Works
Datasets
The Hisfrag database
Pre-processing the Michigan database
Global architecture
Testing different architectures
Batch constitution
Evaluation method
Hisfrag competition metrics
Metrics relevance for evaluating the task
Baseline results
Using a model trained on another dataset
Self-supervised learning
Estimating the quantity of mislabelings
Self-supervised learning vs domain adaptation
Exploring a more realistic use case
Conclusion and future works
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.