Abstract

In this paper, we face the problem of offline handwritten text recognition (HTR) in historical documents when few labeled samples are available and some of them contain errors in the train set. Three main contributions are developed. First we analyze how to perform transfer learning (TL) from a massive database to a smaller historical database, analyzing which layers of the model need a fine-tuning process. Second, we analyze methods to efficiently combine TL and data augmentation (DA). Finally, an algorithm to mitigate the effects of incorrect labelings in the training set is proposed. The methods are analyzed over the ICFHR 2018 competition database, Washington and Parzival. Combining all these techniques, we demonstrate a remarkable reduction of CER (up to 6% in some cases) in the test set with little complexity overhead.

Highlights

  • The transcription of historical manuscripts is paramount for a better understanding of our history, as it allows for direct access to the contents, quite facilitating searches and studies

  • DATABASES In this paper we focus on Handwritten text recognition (HTR) over eight databases: IAM [41], Washington [42], Parzival [42], and the five ones provided at the ICFHR 2018 Competition [5]

  • THE CORRUPTED LABEL PURGING (CLP) ALGORITHM we focus on the impact of the number of lines and their quality in the target dataset on the learning process of the deep neural networks (DNN) model

Read more

Summary

INTRODUCTION

The transcription of historical manuscripts is paramount for a better understanding of our history, as it allows for direct access to the contents, quite facilitating searches and studies. Regarding the state-of-the-art DNN models for HTR, some recent works are in the line of avoiding recurrence in the models, developing models based in fully-convolutional networks such as the (Gated) Fully Convolutional Networks (G)FCN [23]–[26] This kind of model reduces the number of parameters in the architecture. In [4] the authors improve the performance by augmenting the training set with specially crafted multiscale data They propose a model-based normalization scheme that considers the variability in the writing scale at the recognition phase. ARCHITECTURE In this work, we implement a network architecture based on the convolutional recurrent neural network (CRNN) presented in [39] This approach avoids the use of two-dimensional LSTM (2D-LSTM) layers, applying convolutional layers as feature extractors and a stack of 1D bidirectional LSTM (BLSTM) layers to perform classification. For the CTC we tried best path decoding and beam search decoding, with no significant improvement of the later, despite its computational complexity

DATABASES In this paper we focus on HTR over eight databases
ON THE DATA AUGMENTATION AND TRANSFER LEARNING TRADEOFF
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call