A new combined method for character recognizing in Farsi printed scripts using principal component analysis

Shahrouz Gashmard,Alireza Mehri Dehnavi,Hossein Rabbani

doi:10.1145/2345396.2345528

Abstract

This paper introduces a new method for character recognition in Farsi scripts using Principal Component Analysis (PCA). Materials used for this project are selected from ordinary books, magazines, newspapers, and printed documents. Character samples are selected among 4 fonts and 3 sizes (and the total number of recognition classes is set to 20). Methods used during this work are Statistical method, Fast Zernike Wavelet Moment (FZWM) method, PCA, PCA with sample averaging, and PCA with eigenvectors averaging. Finally, comparing the results of these methods, a novel method for Farsi character recognition using PCA with combinational averaging in samples and eigenvectors is introduced. Our simulations show that in Farsi character recognition, PCA method with combinational averaging in samples and eigenvectors improves the accuracy 2.42% versus the statistical method and 5.87% versus the FZWM method. Also its calculation time is 7.6 times faster than Statistical method and 5.15 times faster than FZWM method.

Full Text