OCR with Word Prediction Technique for Bilingual Documents

S Tangwongsan,B Suvacharakulton

doi:10.1109/icis.2012.77

Abstract

This paper proposes a working model of a bilingual OCR system for printed Thai and English text with word prediction technique. The main idea is that instead of recognizing individual characters from an image block as the conventional approach, it attempts to match the whole word from a list of predictive words based on n-gram trees. The matching process is done in the stage of word verification, in which positive and negative matching are both performed. If there is a match, the system will advance to the next at the end of the word boundary. Obviously, the longer the matched word is, the better the system performance will be. A series of experimental results show better performance in terms of speed improvement at 21% on average, while still being able to maintain the accuracy of recognition as expected.

Full Text