Lexicon reduction using dots for off-line Farsi/Arabic handwritten word recognition

Saeed Mozaffari,Karim Faez,Volker Märgner,Haikal El-Abed

doi:10.1016/j.patrec.2007.11.009

Abstract

Unlike many other languages, 18 out of 32 Farsi characters have dots appearing in groups of one, two or three. Some of these letters share common primary shapes, differing only in the number of dots and whether the dots are above or below the primary shape. In this paper, a new concept of using dots in a cursively handwritten Farsi/Arabic word is introduced for lexicon reduction and a fast method for extracting dots is presented. The technique involves extraction and representation of number and position of dots from off-line handwritten words to eliminate unlikely candidates. Experimental results on a set of 12,000 handwritten word images yield a lexicon reduction of 93% with accuracy of 85%. The proposed lexicon reduction algorithm achieves the speedup factor of 2 as well as 13% improvement in recognition rate.

Full Text