Synthetic data for Arabic OCR system development

V Margner,M Pechwitz

doi:10.1109/icdar.2001.953967

Abstract

A system for the automatic generation of synthetic databases for the development or evaluation of Arabic word or text recognition systems (Arabic OCR) is presented. The proposed system works without any scanning of printed paper. Firstly Arabic text has to be typeset using a standard typesetting system. Secondly a noise-free bitmap of the document and the corresponding ground truth (GT) is automatically generated. Finally, an image distortion can be superimposed to the character or word image to simulate the expected real world noise of the intended application. All necessary modules are presented together with some examples. Special problems caused by specific features of Arabic, such as printing from right to left, many diacritical points, variation in the height of characters, and changes in the relative position to the writing line, are suggested. The synthetic data set was used to train and test a recognition system based on hidden Markov model (HMM), which was originally developed for German cursive script, for Arabic printed words. Recognition results with different synthetic data sets are presented.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Synthetic data for Arabic OCR system development

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Offline arabic handwritten text recognition
Mohammad Tanvir Parvez ... Sabri A Mahmoud
ACM Computing Surveys | VOL. 45
Mohammad Tanvir Parvez, et. al.Mohammad Tanvir Parvez ... Sabri A Mahmoud
01 Feb 2013
ACM Computing Surveys | VOL. 45

Arabic Text Recognition and Machine Translation
Ihab Alkhoury
-
Ihab AlkhouryIhab Alkhoury
13 Jul 2015
13 Jul 2015

Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.
Asghar Ali Chandio ... Mehwish Leghari
Data in Brief | VOL. 31
Asghar Ali Chandio, et. al.Asghar Ali Chandio ... Mehwish Leghari
21 May 2020
Data in Brief | VOL. 31

Automatic Arabic Hand Written Text Recognition System
Ismael Ahmad Jannoud
American Journal of Applied Sciences | VOL. 4
Ismael Ahmad JannoudIsmael Ahmad Jannoud
01 Nov 2007
American Journal of Applied Sciences | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Synthetic data for Arabic OCR system development

Abstract

Talk to us

Similar Papers