Abstract

Script identification is an essential step in document image processing especially when the environment is multi-script/multilingual. Till date researchers have developed several methods for the said problem. For this kind of complex pattern recognition problem, it is always difficult to decide which classifier would be the best choice. Moreover, it is also true that different classifiers offer complementary information about the patterns to be classified. Therefore, combining classifiers, in an intelligent way, can be beneficial compared to using any single classifier. Keeping these facts in mind, in this paper, information provided by one shape based and two texture based features are combined using classifier combination techniques for script recognition (word-level) purpose from the handwritten document images. CMATERdb8.4.1 contains 7200 handwritten word samples belonging to 12 Indic scripts (600 per script) and the database is made freely available at https://code.google.com/p/cmaterdb/. The word samples from the mentioned database are classified based on the confidence scores provided by Multi-Layer Perceptron (MLP) classifier. Major classifier combination techniques including majority voting, Borda count, sum rule, product rule, max rule, Dempster-Shafer (DS) rule of combination and secondary classifiers are evaluated for this pattern recognition problem. Maximum accuracy of 98.45% is achieved with an improvement of 7% over the best performing individual classifier being reported on the validation set.

Highlights

  • In the domain of document images processing, Optical Character Recognition (OCR) systems are, in general, developed keeping a particular script in mind, which implies that such systems can read characters written in a specific script only

  • This statement infers that before the document images are fed to an OCR system, it is required to identify the script in which the document is written so that those document images can be suitably converted into a computer-editable format using that OCR system

  • No standard benchmark database of handwritten Indic scripts is freely available in the public domain

Read more

Summary

Introduction

In the domain of document images processing, Optical Character Recognition (OCR) systems are, in general, developed keeping a particular script in mind, which implies that such systems can read characters written in a specific script only. The key idea is that instead of relying on a single decision maker, all the designs or their subsets are applied for the decision making by combining their individual beliefs in order to come up with a consensus decision This fact motivates many researchers to apply the classifier combination methods to different pattern recognition problems. The main contribution of the present work is the comprehensive evaluation of the major classifier this paper applies different classifier combination techniques in the field of Indic script recognition. The motivation is to improve the classification accuracy at the word-level handwritten script combination approaches which are either rule based or apply a secondary classifier for information recognition byThe combining results of the theclassification best performing classifier on three handwritten previously script used feature fusion.

Feature Extraction
EllipticalThe
Sectional
Concentric
Histogram
Output word images of Modified log-Gabor transform a sample handwritten
Classifier
Rule Based Combination Techniques
Majority Voting
Borda Count
Elementary Combination Approaches on Measurement Level
Dempster-Shafer Theory of Evidence
Secondary Classifier Based Combination Techniques
Preparation of Database
Sample images written written in in 12 different different Indian
Performance Analysis
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.