Farsi Font Recognition Based On The Fonts Of Text Samples Extracted By Som

Majid Ziaratban,Fatemeh Bagheri

doi:10.22436/jmcs.015.01.04

Abstract

A Farsi font recognition algorithm based on the fonts of some frequent text samples is proposed. Some features are extracted from the connected components of a text image. The feature vectors are clustered by using a Self-Organizing Map (SOM) clustering method. The clusters with more members determine the most frequent connected components (MFCCs). A number of members of these big clusters are extracted from the input image. This procedure is applied to both training and test images. Since the frequent samples in different Farsi texts are very similar, it can be guaranteed that a large number of samples of the detected MFCCs for a test image surely are in the extracted training samples set. The font type and font style of the extracted test samples are recognized by matching between them and the training samples. The most frequent recognized font of the extracted samples is considered as the font of the input text. To achieve a more accurate algorithm with lower complexity, the font size is determined in the second phase after the phase of the font type and style recognition. Using a lexicon reduction procedure reduces the complexities and processing time. The font size estimation is carried out based on the size of a particular MFCC in a text image. Experiments show that the proposed method outperforms other font recognition methods.

Full Text