OCR: Optical Character Recognition

Kishan Kushavaha

doi:10.22214/ijraset.2024.60794

Abstract

Abstract:In many different fields, there is a great need to save the information contained in printed or written documents or ima ges to the computer for later use. An easy way to save the information in these printed documents to your computer is to scan the document and save it as an image file. But it would be very difficult to read or query the text or other information in this image f ile to reuse this data. Therefore, there needs to be a technology that automatically collects and stores information (especially text ) from image files. Optical character recognition is a field of research that seeks to create computer systems that can extract and process text from images. The purpose of OCR is to alter or convert any text or document containing text (such as handwritten, p rinted, or scanned images) into a digital format that can be converted to be processed for greater depth and processing. OCR thu s enables machines to recognize text in these documents. There are some important issues that need to be recognized and address ed in order for automation to be implemented successfully. Characterization of the alphabet and the quality of characters in the document are just some of the recent challenges. Due to these problems, the computer may not recognize characters correctly. In this article, we examine OCR in four different ways. First, we provide detailed information about the difficulties that may arise during OCR. Secondly, we examine the general stages of OCR systems such as preprocessing, segmentation, normalization, feat ure extraction, classification and post

Full Text