Abstract

Images play an essential function in the electronic media to share information. Nowadays, each event is going to be recorded in the arrangement of digital images. Text from the image file won't be in a format on the computer. OCR (Optical Character Recognition) for English vocabulary is well constructed. Currently, there's a requirement of OCR for Indian languages to maintain historical documents composed mainly in Indian languages to arrange publications in the library and for program form processing. OCR for the Telugu language is challenging as consonants and vowels plays a vital role in forming words along with vattus and gunithas. It may be a mixture of vowels and consonants that may form a compound character. This paper presents research on methods utilized in the OCR method for the Telugu Language until today.

Highlights

  • There was limited research in the maturation of a complete Optical character recognition (OCR) program for Telugu script

  • The Telugu script consists of intermediate complexity, in which consonant-vowel pairs have been composed as a single unit

  • While our work was under review, Google Drive added an OCR functionality that works for Telugu and many other world languages

Read more

Summary

Introduction

There was limited research in the maturation of a complete OCR program for Telugu script. Later work on Telugu OCR primarily followed the featurization-classification paradigm. The first recorded work on OCR to get Telugu could be dated back as early as 1977 from Rajasekharan; Deekshatulu utilized features that synthesize the curves that follow a letter compare that this encoding using a group of predefined templates [12]. Afterwards, work on Telugu OCR mostly adopted closely by the feature classification paradigm.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call