Neural Network based Bilingual OCR System: Experiment with English and Kannada Bilingual Documents

Dr.S.Basavaraj Patil

doi:10.5120/1803-2279

Abstract

The paper presents the Neural Network based Bilingual OCR system which can read printed document images, written in two scripts of English and Kannada languages. Such systems are highly preferred in automation of multi-script, multi lingual document processing. The developed system includes document image pre-processor, dynamic feature extractor, neural network based script classifier, Kannada character recognition system and English character recognition system. Document image pre-processor, accepts the bilingual document image and performs grey to two tone conversion, segmentation into lines and words. Dynamic feature extractor extracts distinctive equal number of features from each separated word irrespective of size of the word. These features are accepted by probabilistic neural classifier and are sorted by script, Kannada and Roman. Developed Kannada character recognition system accepts these words and further segments each word into characters and maps the recognized characters into corresponding ASCII values of the chosen Kannada font. Similarly specifically developed English character recognition system, segments English words into characters and maps to corresponding ASCII value of the specific English font. Thus recognized English and Kannada characters are written into separate ASCII files language wise. The results are exciting and proved the effectiveness of the approach. General Terms Pattern Recognition, Script Identification, Neural Networks, Optical Character Recognizer(OCR)

Full Text