Abstract

Farsi font detection is considered as the first stage in the Farsi optical character recognition (FOCR) of scanned printed texts. To this aim, this paper proposes an improved version of the speeded-up robust features (SURF) algorithm, as the feature detector in the font recognition process. The SURF algorithm suffers from creation of several redundant features during the detection phase. Thus, the presented version employs the redundant keypoint elimination method (RKEM) to enhance the matching performance of the SURF by reducing unnecessary keypoints. Although the performance of the RKEM is acceptable in this task, it exploits a fixed experimental threshold value which has a detrimental impact on the results. In this paper, an Adaptive RKEM is proposed for the SURF algorithm which considers image type and distortion, when adjusting the threshold value. Then, this improved version is applied to recognize Farsi fonts in texts. To do this, the proposed Adaptive RKEM-SURF detects the keypoints and then SURF is used as the descriptor for the features. Finally, the matching process is done using the nearest neighbor distance ratio. The proposed approach is compared with recently published algorithms for FOCR to confirm its superiority. This method has the capability to be generalized to other languages such as Arabic and English.

Highlights

  • Farsi is the official language of Iran, Tajikistan and Afghanistan

  • This paper proposes an approach for Farsi font detection, which works based on an improved version of the speeded-up robust features (SURF) algorithm

  • We evaluate the performance of the proposed A.redundant keypoint elimination method (RKEM)-SURF algorithm and compare it with the RKEM-scale-invariant feature transform (SIFT)[18], the method of method [28], the Sobel–Roberts features in [13] and the SIFT [8]

Read more

Summary

Introduction

Farsi is the official language of Iran, Tajikistan and Afghanistan. Farsi is among the first three languages of the world in terms of the number and variety of proverbs [1]. The aim is to generate Farsi systems that are comparable in accuracy and performance to the English OCRs. Font detection is one of the most useful pre-processing steps in improving the OCR performance for systems which deal with typeset-printed-scanned texts consisting several different fonts [4, 5]. No effective method has been developed for Farsi font detection which is comparable to those for English in terms of recognition accuracy. Given the importance and wide spread use of OCR and low accuracy of existing methods, proposing operative Farsi OCRs is mandatory and challenging This paves the ground for the motivation to propose approaches which improve the Farsi font detection system so that it reaches acceptable detection rates

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call