Self-supervised learning for medieval handwriting identification: A case study from the Vatican Apostolic Library

Lorenzo Lastilla,Serena Ammirati,Donatella Firmani,Nikos Komodakis,Paolo Merialdo,Simone Scardapane

doi:10.1016/j.ipm.2022.102875

Lorenzo Lastilla, Serena Ammirati + Show 4 more

Open Access

https://doi.org/10.1016/j.ipm.2022.102875

Copy DOI

Abstract

In this paper, we consider the task of automatically identifying whether different parts of medieval and modern manuscripts can be traced back to the same copyist/scribe, a problem of significant interest in paleography. Currently, the application of deep learning techniques in the context of scribe recognition has been hindered by the lack of a sufficiently large, labeled dataset, since the labeling process is incredibly complex and time-consuming. Here, we propose the first successful application of the recent framework of self-supervised learning to the field of digital paleography, wherein we pretrain a convolutional neural network by leveraging large amounts of unlabeled manuscripts. To this end, we build a novel dataset consisting of both labeled and unlabeled manuscripts for copyist identification extracted from the Vatican Apostolic Library. We show that fine-tuning this model to the task of interest significantly outperforms other baselines, including the common setup of initializing the network from general-domain features, or training the model from scratch, also in terms of generalization power. Overall, our results reveal the strong potential of self-supervised techniques in the field of digital paleography, where unlabeled data (i.e., digitized manuscripts) is nowadays available, while labeled data is scarcer.

Full Text