Abstract

ABSTRACTNowadays, a number of scripts are used for writing. Script identification finds many applications like sorting and preparing an online database of documents. Identifying these scripts, especially with different orientations and scales, is an important and challenging problem in document image analysis. This paper proposed a new scheme for script identification from word images using skew and scale robust log-polar curvelet features. These word images are first extracted in the form of text-patches from documents using Gaussian filtering. Thereafter, texture features are calculated using curvelet transform in log-polar domain. Log-polar domain is independent of rotation and scale variations, whereas curvelet transform exhibits directional and anisotropic properties. This helps in the extraction of significant features. For experiments, k-nearest neighbor classifier is employed to identify the scripts, as it has zero training time and is simple to implement. Further, statistical significance test is performed by using two more classifiers, namely random forest and support vector machine. Comprehensive experimentations are carried out on ALPH-REGIM, Pati and Ramakrishnan, PHDIndic_11, and proprietary databases containing printed as well as handwritten texts. Here, bi-script, tri-script, and multi-script identification results are reported. Benchmarking analysis illustrated the effectiveness of the proposed method, where a maximum recall rate of 98.76% has been achieved.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.