Self-training for handwritten word recognition and retrieval

Fabian Wolf,Gernot A Fink

doi:10.1007/s10032-024-00484-9

Abstract

Handwritten text recognition and Word Retrieval, also known as Word Spotting, are traditional problems in the document analysis community. While the use of increasingly large neural network architectures has led to a steady improvement of performances it comes with the drawback of requiring manually annotated training data. This poses a tremendous problem considering their application to new document collections. To overcome this drawback, we propose a self-training approach that allows to train state-of-the-art models for HTR and word spotting. Self-training is a common technique in semi-supervised learning and usually relies on a small labeled dataset and training on pseudo-labels generated by an initial model. In this work, we show that it is feasible to train models on synthetic data that are sufficiently performant to serve as initial models for self-training. Therefore, the proposed training method does not rely on any manually annotated samples. We further investigate visual and language properties of the synthetic datasets. In order to improve performance and robustness of the self-training approach, we propose different confidence measures for both models that allow to identify and remove erroneous pseudo-labels. The presented training approach clearly outperforms other learning-free methods or adaptation strategies under the absence of manually annotated data.

Full Text