Optical character recognition is a powerful technology that converts images with text into an editable and searchable format. This ensures efficiency in working with documents, increases the accessibility of information, and helps automate many processes. This technology was first used in the early 90s, when digitizing historical newspapers to create an electronic archive. In recent years, the optical character recognition system has been refined to the "ideal": current optical character recognition systems show almost perfect text recognition accuracy. But this requires the following requirements to be met: equality and contrast of symbols; uniformity of the text background; contrast between the background and letters. The process of recognizing symbols and text from images is much more complicated when the above requirements are not met, and solving such problems is the current requirement for solving practical tasks in the military sphere. Text recognition from images has many important applications, which makes it relevant and necessary. The article analyzes several of the most well-known and popular artificial intelligence models for text recognition from images, such as Tesseract OCR, PyTorch, EasyOCR, Keras OCR5, OpenCV, using specific types of images, and obtains results for recognizing text and symbols from images of varying complexity. To assess the accuracy of character and text recognition, a recognition accuracy assessment method has been developed based on special evaluation metrics, which are based on comparing the recognized text with the reference (correct) text. The most common metrics include the character recognition accuracy (CAR) and the word recognition accuracy (WAR). Using the developed recognition accuracy assessment method, an analysis of the recognition accuracy of the most popular tools of optical text and symbol recognition technology from images of varying complexity has been conducted. The analysis showed that the EasyOcr model demonstrates the greatest efficiency and accuracy of recognition, which, even in conditions of strong "noise" and poor image contrast, demonstrated a stable result and, subject to further customization for the user's needs, can be used to solve a specific task.
Read full abstract