Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research

Laslo Dinges,Ayoub Al-Hamadi,Sherif El-Etriby,Moftah Elzobi

doi:10.3390/s16030346

Abstract

Document analysis tasks such as pattern recognition, word spotting or segmentation, require comprehensive databases for training and validation. Not only variations in writing style but also the used list of words is of importance in the case that training samples should reflect the input of a specific area of application. However, generation of training samples is expensive in the sense of manpower and time, particularly if complete text pages including complex ground truth are required. This is why there is a lack of such databases, especially for Arabic, the second most popular language. However, Arabic handwriting recognition involves different preprocessing, segmentation and recognition methods. Each requires particular ground truth or samples to enable optimal training and validation, which are often not covered by the currently available databases. To overcome this issue, we propose a system that synthesizes Arabic handwritten words and text pages and generates corresponding detailed ground truth. We use these syntheses to validate a new, segmentation based system that recognizes handwritten Arabic words. We found that a modification of an Active Shape Model based character classifiers—that we proposed earlier—improves the word recognition accuracy. Further improvements are achieved, by using a vocabulary of the 50,000 most common Arabic words for error correction.

Highlights

Modern document analysis heavily depends on automated processes as pattern recognition or segmentation
Using Support Vector Machines (SVMs) we achieved good results (95.4% ± 0.3% and 97.14% ± 0.06%), which are similar to our Active Shape Models (ASMs) based approach
We have presented an efficient approach to synthesize Arabic handwritten words and text pages from Unicode

Summary

Introduction

Modern document analysis heavily depends on automated processes as pattern recognition or segmentation These processes need to be trained using a database and validated using corresponding, suitable ground truth (GT), though. The IESK-arDB database, that we proposed in [6], contains international town names and common terms including GT for segmentation. Since both databases are limited by the number of samples and words and contain single words or small sentences only, we believe automatized generation of databases customized for specific research is a helpful complement. Advantages are that samples can be created and quickly for any word or text at any time, detailed GT are created simultaneously

Arabic Script

Related Works

Synthesizing Arabic Handwriting Databases

Data Acquisition using Infrared and Ultrasonic Sensors

Active Shape Models

Word Sample Synthesis

Generation of Pseudo Texts

Extension of the IESK-arDB by Synthesised Samples

Segmentation based Recognition of Handwritten Arabic Words

Segmentation

Character Recognition

Decision Trees

Support Vector Machines

Word Recognition

Error Correction

Experimental Results

SVM based OCR

ASM-Based OCR

Character Level Word Correction

Word Level Error Correction

Computational Effort

Conclusions and Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Mar 11, 2016
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Arabic handwritten word spotting using language models
M Khayyat ... C Y Suen
-
M Khayyat, et. al.M Khayyat ... C Y Suen
01 Sep 2012
01 Sep 2012

ASM Based Synthesis of Handwritten Arabic Text Pages.
Laslo Dinges ... Ayoub Al-Hamadi
TheScientificWorldJournal | VOL. 2015
Laslo Dinges, et. al.Laslo Dinges ... Ayoub Al-Hamadi
01 Jan 2015
TheScientificWorldJournal | VOL. 2015

On Arabic Abstract and Concrete Words Recall Using Free Recall Paradigms: Is It Abstractness, Concreteness, or Zero Effect
Nasser Saleh Al-Mansour
Psychology and Behavioral Sciences | VOL. 4
Nasser Saleh Al-MansourNasser Saleh Al-Mansour
01 Jan 2015
Psychology and Behavioral Sciences | VOL. 4

Learning-based word spotting system for Arabic handwritten documents
Muna Khayyat ... Ching Y Suen
Pattern Recognition | VOL. 47
Muna Khayyat, et. al.Muna Khayyat ... Ching Y Suen
11 Sep 2013
Pattern Recognition | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)