Myanmar optical character recognition using block definition and featured approach

Zu Zu Aung,Cho Me Me Maung

doi:10.1109/icsitech.2017.8257131

Abstract

Optical Character Recognition (OCR) can be used in many applications such as machine translation, postal processing, script recognition, text-to-speech, reading aid for blind, etc. Myanmar OCR system is essential to convert numerous published books, newspapers and journals of Myanmar into editable computer text files. It is a challenge for recognizing Myanmar old printed document in case of bad quality, absence of standard alphabets, absence of known fonts, ink through page, uneven background, broken characters, overlapped scripts and mixed scripts. This paper presents a new proposed block definition method for isolation printed Myanmar historical text. The proposed Myanmar optical character recognition (MOCR) system consists of local adaptive thresholding method for binarization and skew-slant correction, thinning algorithm is applied to obtain separation lines and words. For isolation of characters, block definition method is applied and adaptive neuro-fuzzy inference system (ANFIS) is matched the features in the trained database as machine readable text. Myanmar alphabets include consonants, vowels, medials and digits. By using block definition method, consonants and vowels are isolated easily and we obtained more accuracy rate of the OCR. The efficient experimental results are presented by using different Myanmar old documents in our proposed algorithms.

Full Text