Abstract

Optical Character Recognition (OCR) can be used in many applications such as machine translation, postal processing, script recognition, text-to-speech, reading aid for blind, etc. Myanmar OCR system is essential to convert numerous published books, newspapers and journals of Myanmar into editable computer text files. It is a challenge for recognizing Myanmar old printed document in case of bad quality, absence of standard alphabets, absence of known fonts, ink through page, uneven background, broken characters, overlapped scripts and mixed scripts. This paper presents a new proposed block definition method for isolation printed Myanmar historical text. The proposed Myanmar optical character recognition (MOCR) system consists of local adaptive thresholding method for binarization and skew-slant correction, thinning algorithm is applied to obtain separation lines and words. For isolation of characters, block definition method is applied and adaptive neuro-fuzzy inference system (ANFIS) is matched the features in the trained database as machine readable text. Myanmar alphabets include consonants, vowels, medials and digits. By using block definition method, consonants and vowels are isolated easily and we obtained more accuracy rate of the OCR. The efficient experimental results are presented by using different Myanmar old documents in our proposed algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.