Conv-transformer architecture for unconstrained off-line Urdu handwriting recognition

Nauman Riaz,Haziq Arbab,Faisal Shafait,Adnan Ul-Hasan,Arooba Maqsood,Khuzaeymah Nasir

doi:10.1007/s10032-022-00416-5

Abstract

Unconstrained off-line handwriting text recognition in general and for Arabic-like scripts in particular is a challenging task and is still an active research area. Transformer-based models for English handwriting recognition have recently shown promising results. In this paper, we have explored the use of transformer architecture for Urdu handwriting recognition. The use of a convolution neural network before a Vanilla full transformer and using Urdu printed text-lines along with handwritten text lines during the training are the highlights of the proposed work. The convolution layers act to reduce the spatial resolutions and compensate for the \(n^{2}\) complexity of transformer multi-head attention layers. Moreover, the printed text images in the training phase help the model in learning a greater number of ligatures (a prominent feature of Arabic-like scripts) and a better language model. Our model achieved state-of-the-art accuracy (CER of \(5.31\%\)) on publicly available NUST-UHWR dataset (Zia et al. in Neural Comput Appl 34:1–14, 2021).

Full Text