Efficient Arabic text extraction and recognition using thinning and dataset comparison technique

Abdul Khader Jilani Saudagar,Habeeb Vulla Mohammed,Kamran Iqbal,Yasir Javed Gyani

doi:10.1109/iccict.2015.7045725

Abdul Khader Jilani Saudagar, Habeeb Vulla Mohammed + Show 2 more

https://doi.org/10.1109/iccict.2015.7045725

Copy DOI

Abstract

The objective of this research paper is to propose a novel technique for Arabic text extraction and recognition which is a part of research work aimed at developing a system for moving Arabic video text extraction for efficient content based indexing and searching. Numerous techniques were proposed in the past for text extraction but very few of them focus on Arabic text. All the earlier proposed implementations are not successful in attaining 100 % accuracy in text extraction and recognition process. The proposed technique is new and is based on thinning the given sample image containing Arabic text and splitting the resulting image horizontally (X-axis direction) from right to left in equal intervals. Compare each part of the image for equal number of white pixels to those of samples in the dataset. Upon matching, with the help of index value the corresponding character is stored in an array. This process is repeated by varying the splitting interval until all the characters in the sample image are recognized. To our knowledge, our research is the primary to address the above problem and propose a solution with increased retrieval accuracy and reduced computation time for Arabic text extraction and recognition.

Full Text