Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition

Slim Kanoun,Adel M Alimi,Yves Lecourtier

doi:10.1109/tsmcb.2010.2072990

Abstract

In this paper, we propose a new linguistic-based approach called the affixal approach for Arabic word and text image recognition. Most of the existing works in the field integrate the knowledge of the Arabic language in the recognition process in two ways: either in post-recognition using the language of dictionary (dictionary of words) to validate the word hypotheses suggested by the OCR or in the course of the recognition process (recognition directed by a lexicon) using a statistical model of the language (Hidden Markov Model or N-gram). The proposed approach uses the linguistic concepts of the vocabulary to direct and simplify the recognition process. The principal contribution of the proposed approach is to be able to categorize the word hypotheses in words that are either derived or not derived from roots and to characterize morphologically each word hypothesis in order to prepare the text hypotheses for later analyses (for example, syntactic analysis; to filter the sentence hypotheses).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)

Lead the way for us

Journal: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)	Publication Date: Sep 30, 2010
Citations: 48

Similar Papers

Arabic Handwritten Word Recognition Using HMMs with Explicit State Duration
A Benouareth ... A Ennaji
EURASIP Journal on Advances in Signal Processing | VOL. 2008
A Benouareth, et. al.A Benouareth ... A Ennaji
29 Nov 2007
EURASIP Journal on Advances in Signal Processing | VOL. 2008

A hybrid large vocabulary handwritten word recognition system using neural networks with hidden Markov models
A.L Koerich ... Y Leydier
-
A.L Koerich, et. al.A.L Koerich ... Y Leydier
06 Aug 2002
06 Aug 2002

Synthetic data for Arabic OCR system development
V Margner ... M Pechwitz
-
V Margner, et. al.V Margner ... M Pechwitz
01 Sep 2001
01 Sep 2001

Single tree method for grammar directed, very large vocabulary speech recognizer
Richard M Schwartz ... Long Nguyen
The Journal of the Acoustical Society of America | VOL. 102
Richard M Schwartz, et. al.Richard M Schwartz ... Long Nguyen
01 Jan 1997
The Journal of the Acoustical Society of America | VOL. 102

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)