Abstract
The OCR is an electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. The Optical Character System is available for various languages, such as English, Chinese and Arabic script, but it is commercially not available for Odia script. We have taken a step to develop OCR system for Odia language. The OCR is popular for its various applications potentials in banks, library automation, post-offices, defense organizations and language processing. Line and Word segmentation is one of the important steps of OCR system. The accuracy of the word/character recognition is directly affected by the correctness/ incorrectness of text-line and word segmentation. In this paper we have proposed a robust method for segmentation of individual text lines of Odia printed document image file. The segmented text line is the input for the word segmentation method which produces segmented words. Both foreground and background information are used in the proposed method. We have tested our method on scanned Odia scripts as well as some multi-script documents and obtained encouraging result. This technique is based on the intensities of pixels in the document.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.