An Optical Character Recognition (OCR) approach for printed Arabic script is presented in this paper, Which is one of the most popular scripts in the world. Development of an OCR system. For Arabic script it is difficult because Arabic characters are distinct and many structurally similar characters exist in the character set. In the proposed approach, the technique can be divided into three major steps. The first step is digitization then do some pre-processing like segmentation to detect the slant of character and correct it .Second, feature extraction , using gray-level matrices. Finally, the K-Nearest-Neighbors is used for classification. This method was tested using 45 patterns for each Arabic character with different fonts (simplified Arabic, tahoma, traditional Arabic), The sample images were divided into 20 training and 25 test images. Images in the test set did not appear in the training sets. This method performs extremely well with recognition rates .90.3%. This is a very good performance. All of this demonstrates that the new method is able to handle printed Arabic character task efficiently. It is a promising technique for recognition printed Arabic character. 1. Introduction Optical character recognition (OCR), deals with the recognition of optically processed character rather magnetically processed ones. In a typical OCR system, input characters are read and digitized by an optical scanner. Each character is then located, segmented and the resulting matrix is fed into a preprocessor. Off-line recognition can de considered the most general case: no special device is required for writing and signal interpretation is independent of signal generation, as in human recognition [6]. The recognition of Arabic character has been an area of great interest for many years, and a number of research papers and reports have already been published in this area. There are several major problems with Arabic character recognition: Arabic characters are distinct and ideographic, many structurally similar character exist in the character set Table (1). Thus, classification criteria are difficult to generate [1 J[3j[6]. The Arabic language has a rich vocabulary. More than 200 million people speak this language as their native speaking, and over 1 billion people use its character set, such as Persian and Urdu, Due to the cursive nature of the script, there are several characteristic that make recognition of Arabic distinct from the recognition of Latin script or Chinese The study of Arabic character recognition has been regarded since 1980s. However, in comparison with the other languages, such as Latin, Chinese and Japanese, there is a little work has been conducted on the automatic recognition of Arabic character [4][5].
Read full abstract