An Arabic Script Recognition System

Yasser M Alginahi ,Mohammed Mudassar ,Muhammad Nomani Kabir

doi:10.3837/tiis.2015.09.023

Abstract

A system for the recognition of machine printed Arabic script is proposed. The Arabic script is shared by three languages i.e., Arabic, Urdu and Farsi. The three languages have a descent amount of vocabulary in common, thus compounding the problems for identification. Therefore, in an ideal scenario not only the script has to be differentiated from other scripts but also the language of the script has to be recognized. The recognition process involves the segregation of Arabic scripted documents from Latin, Han and other scripted documents using horizontal and vertical projection profiles, and the identification of the language. Identification mainly involves extracting connected components, which are subjected to Principle Component Analysis (PCA) transformation for extracting uncorrelated features. Later the traditional K-Nearest Neighbours (KNN) algorithm is used for recognition. Experiments were carried out by varying the number of principal components and connected components to be extracted per document to find a combination of both that would give the optimal accuracy. An accuracy of 100% is achieved for connected components >=18 and Principal components equals to 15. This proposed system would play a vital role in automatic archiving of multilingual documents and the selection of the appropriate Arabic script in multi lingual Optical Character Recognition (OCR) systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Arabic Script Recognition System

Abstract

Talk to us

Similar Papers

More From: KSII Transactions on Internet and Information Systems

Lead the way for us

Journal: KSII Transactions on Internet and Information Systems	Publication Date: Sep 30, 2015
Citations: 2

Similar Papers

Analysis of Segmentation Methods for Brahmi Script
Ajay Pratap Singh ... Ashwin Kumar Kushwaha
DESIDOC Journal of Library & Information Technology | VOL. 39
Ajay Pratap Singh, et. al.Ajay Pratap Singh ... Ashwin Kumar Kushwaha
11 Mar 2019
DESIDOC Journal of Library & Information Technology | VOL. 39

A Histogram-Based Two-Stage Adaptive Character Segmentation for Transcription of Inter-Point Hindi Braille to Text
T Shreekanth ... V Udayashankara
International Journal of Image and Graphics | VOL. 15
T Shreekanth, et. al.T Shreekanth ... V Udayashankara
11 Jun 2015
International Journal of Image and Graphics | VOL. 15

Discrimination Of English To Other Indian Languages (Kannada And Hindi) For Ocr System
Ankit Kumar
International Journal of Computer Science, Engineering and Applications | VOL. 2
Ankit KumarAnkit Kumar
30 Apr 2012
International Journal of Computer Science, Engineering and Applications | VOL. 2

A Robust OCR for Degraded Documents
Kapil Dev Dhingra ... Pramod Kumar Sharma
-
Kapil Dev Dhingra, et. al.Kapil Dev Dhingra ... Pramod Kumar Sharma
01 Jan 2008
01 Jan 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Arabic Script Recognition System

Abstract

Talk to us

Similar Papers

More From: KSII Transactions on Internet and Information Systems