Abstract

This work addresses three important yet challenging problems of handwritten text understanding: word recognition, query-by-example (QBE) word spotting and query-by-string (QBS) word spotting. In most existing approaches, these related tasks are considered independently. We propose a single unified framework based on deep learning to solve all three tasks efficiently and simultaneously. In this framework, an end-to-end deep neural network architecture is used for the joint embedding of handwritten word texts and images. Word images are embedded via a convolution neural network (CNN), which is trained to predict a representation modeling character-level information. The output of the last convolutional layer is considered as representation in the joint embedding subspace. Likewise, a recurrent neural network (RNN) is used to map a sequence of characters to the joint subspace representation. Finally, a model based on multi-layer perceptrons is proposed to predict the matching probability between two embedding vectors. Experiments on five databases of documents written in three languages show our method to yield state-of-the-art performance for QBE and QBS word spotting. The proposed method also obtains competitive results for word recognition, when compared against approaches tailored specifically for this task.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call