Abstract

OCR (Optical Character Recognition) systems are being developed due to their numerous applications even for Indian scripts like Telugu which are complicated due to the usage of a large number of symbols. OCR systems typically store pre-computed features of symbols to be recognized in a database. Recognition of an unknown symbol is performed by finding the symbol in the database that is nearest in features space. Design of an appropriate database is, therefore, a critical step. This is especially so when the OCR system targets recognition of numerous symbols in multiple fonts and sizes. The idea is to develop an OCR system that has small recognition times and high recognition accuracies. The naive approach of putting features of all symbols in all fonts and sizes in the database might be counterproductive on both counts. Experimental results on text document images with multiple fonts and sizes show that the strategy for database design for OCR of printed Telugu text proposed in this paper achieves both the objectives. This is the first reported approach for such a database design for Telugu OCR.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call