BPTI: Bilingual Printed Text Images Dataset for Recognition Purposes

Mohammed Yahia,Husni Al-Muhtaseb

doi:10.34028/iajit/20/4/12

Abstract

Datasets of text images are important for optical text recognition systems. Such datasets can be used to enhance performance and recognition rates. In this research work, we present a bilingual dataset consists of Arabic/English text images to address the lack of availability of bilingual text databases. The presented dataset consists of 97812 text images, which are categorized into two groups; Scanned page and digitized line images. Images of the two forms are written with 10 fonts and four sizes, and prepared/scanned with four dpi resolutions. The dataset preparation process includes text collection, text editing, image construction, and image processing. The dataset can be used in optical text recognition, optical font recognition, language identification, and segmentation. Different text recognition and language identification experiments have been conducted using images of the dataset and Hidden Markov Model (HMM) classifier. For the digitized images recognition experiments, the best-achieved recognition correctness is 99.01% and the best accuracy is 99.01%. The font that has the highest recognition rates was Tahoma. For the scanned images recognition experiments, Tahoma has also shown the highest performance with 97.86% for correctness and 97.73% for accuracy. For the language identification experiments, Tahoma has shown the performance with 99.98% for word-language identification rate.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BPTI: Bilingual Printed Text Images Dataset for Recognition Purposes

Abstract

Talk to us

Similar Papers

More From: The International Arab Journal of Information Technology

Lead the way for us

Similar Papers

A novel statistical feature extraction method for textual images: Optical font recognition
Bilal Bataineh ... Khairuddin Omar
Expert Systems With Applications | VOL. 39
Bilal Bataineh, et. al.Bilal Bataineh ... Khairuddin Omar
28 Nov 2011
Expert Systems With Applications | VOL. 39

Application of Geometry Rectification to Deformed Characters Recognition
Honghui Fan ... Liqun Wang
-
Honghui Fan, et. al.Honghui Fan ... Liqun Wang
01 Jan 2015
01 Jan 2015

Recognition of Printed Text Based on Hidden Markov Model
Ghaydaa Al-Talib ... Armanesa Hasson
AL-Rafidain Journal of Computer Sciences and Mathematics | VOL. 7
Ghaydaa Al-Talib, et. al.Ghaydaa Al-Talib ... Armanesa Hasson
01 Dec 2010
AL-Rafidain Journal of Computer Sciences and Mathematics | VOL. 7

Off-line character recognition using HMM by multiple directional feature extraction and voting with bagging algorithm
H Nishimura ... Y Nakano
-
H Nishimura, et. al.H Nishimura ... Y Nakano
01 Jan 1998
01 Jan 1998

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BPTI: Bilingual Printed Text Images Dataset for Recognition Purposes

Abstract

Talk to us

Similar Papers

More From: The International Arab Journal of Information Technology