Abstract

Abstract: A computer vision program called Handwritten Text Recognition (HTR) attempts to recognize and translate handwritten text from scanned or photographed images. In this project, we suggest implementing an HTR system using Tesseract and OpenCV. English, Chinese, and Arabic are all supported by the popular open-source optical character recognition (OCR) engine known as Tesseract. It is employed to find and identify printed text within photographs. On the other hand, OpenCV is a well-liked computer vision library that offers several tools for processing and analyzing images. The pre-processing step of the proposed system uses OpenCV to increase the input image's quality and OCR accuracy. After that, Tesseract receives the pre-processed image for text recognition. The extracted text is then saved in a text file after being identified. To enhance the quality of the input image, the project will use several pre-processing techniques, including deskewing, noise removal, and binarization. With the help of a sizable dataset of handwritten photographs, the Tesseract OCR engine is taught to recognize handwritten text more accurately. The HTR system can be used in a variety of fields, including document analysis, historical manuscript digitalization, and postal automation. It can also be applied in academic settings to help students translate their notes and assignments. Therefore, it is anticipated that the proposed HTR system employing Tesseract and OpenCV will offer a reliable and effective method for identifying and transcribing handwritten text from photographs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call