Abstract

Character recognition is the extraction of printed or handwritten text from images into machine-readable format. The extracted text can be easily edited, modified and efficiently stored. While there are several Optical Character Recognition (OCR) and Handwritten Character Recognition (HCR) systems available for the English language, such systems are not well developed for Indian languages such as Gujarati. This work deals with text recognition of the Gujarati Script. Two different models have been analyzed in this work for the task of recognition of Gujarati text: CNN based EfficientNet B3 and YOLO v4. The system has been developed using the EfficientNet B3 model which gives better accuracy and efficiency. The input to the system is an image having optical Gujarati text and the system produces an editable text document having the contents of the recognized text in the image. The system has been successfully implemented for the task of creating a digital library of Gujarati newspapers articles from their images. This novel project is a step toward the cultural and linguistic preservation of the Gujarati language.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call