Character Recognition Accuracy Research Articles

Optical character recognition is a powerful technology that converts images with text into an editable and searchable format. This ensures efficiency in working with documents, increases the accessibility of information, and helps automate many processes. This technology was first used in the early 90s, when digitizing historical newspapers to create an electronic archive. In recent years, the optical character recognition system has been refined to the "ideal": current optical character recognition systems show almost perfect text recognition accuracy. But this requires the following requirements to be met: equality and contrast of symbols; uniformity of the text background; contrast between the background and letters. The process of recognizing symbols and text from images is much more complicated when the above requirements are not met, and solving such problems is the current requirement for solving practical tasks in the military sphere. Text recognition from images has many important applications, which makes it relevant and necessary. The article analyzes several of the most well-known and popular artificial intelligence models for text recognition from images, such as Tesseract OCR, PyTorch, EasyOCR, Keras OCR5, OpenCV, using specific types of images, and obtains results for recognizing text and symbols from images of varying complexity. To assess the accuracy of character and text recognition, a recognition accuracy assessment method has been developed based on special evaluation metrics, which are based on comparing the recognized text with the reference (correct) text. The most common metrics include the character recognition accuracy (CAR) and the word recognition accuracy (WAR). Using the developed recognition accuracy assessment method, an analysis of the recognition accuracy of the most popular tools of optical text and symbol recognition technology from images of varying complexity has been conducted. The analysis showed that the EasyOcr model demonstrates the greatest efficiency and accuracy of recognition, which, even in conditions of strong "noise" and poor image contrast, demonstrated a stable result and, subject to further customization for the user's needs, can be used to solve a specific task.

Read full abstract

In the digital age, efficient digitization of administrative documents is a real challenge, particularly for languages with complex scripts such as those used in Moroccan documents. The subject matter of this article is the digitization of Moroccan administrative documents using pre-trained convolutional neural networks (CNNs) for advanced character recognition. This research aims to address the unique challenges of accurately digitizing various Moroccan scripts and layouts, which are crucial in the digital transformation of administrative processes. Our goal was to develop an efficient and highly accurate character recognition system specifically tailored for Moroccan administrative texts. The tasks involved comprehensive analysis and customization of pre-trained CNN models and rigorous performance testing against a diverse dataset of Moroccan administrative documents. The methodology entailed a detailed evaluation of different CNN architectures trained on a dataset representative of various types of characters used in Moroccan administrative documents. This ensured the adaptability of the models to real-world scenarios, with a focus on accuracy and efficiency in character recognition. The results were remarkable. DenseNet121 achieved a 95.78% accuracy rate on the Alphabet dataset, whereas VGG16 recorded a 99.24% accuracy on the Digits dataset. DenseNet169 demonstrated 94.00% accuracy on the Arabic dataset, 99.9% accuracy on the Tifinagh dataset, and 96.24% accuracy on the French Special Characters dataset. Furthermore, DenseNet169 attained 99.14% accuracy on the Symbols dataset. In addition, ResNet50 achieved 99.90% accuracy on the Character Type dataset, enabling accurate determination of the dataset to which a character belongs. In conclusion, this study signifies a substantial advancement in the field of Moroccan administrative document digitization. The CNN-based approach showcased in this study significantly outperforms traditional character recognition methods. These findings not only contribute to the digital processing and management of documents but also open new avenues for future research in adapting this technology to other languages and document types.

Read full abstract

Character Recognition Accuracy Research Articles

Related Topics

Articles published on Character Recognition Accuracy

V. Hrinkov, G. Hrinkova, S. Hrinkov. Analysis of modern optical character recognition tools for character recognition and text from the image

Klasifikasi Aksara Sasak Menggunakan Convolutional Neural Networks (CNN)

Self-distillation with beta label smoothing-based cross-subject transfer learning for P300 classification

Handwritten Text Recognition: A Survey of OCR Techniques

Stretchable, conductive Hydrogel-Based triboelectric nanogenerator integrated with deep learning algorithm for character recognition

Deep learning-based digitization of Kurdish text handwritten in the e-government system

Hetero-associative Memory Based New Iraqi License Plate Recognition

FINet: Handwriting trajectory reconstruction of Chinese characters based on the font imitate network

Vehicle License Plate Detection and Recognition using OpenCV and Tesseract OCR

Research on efficient braille recognition based on the pair-U-shaped micro-nano optical fiber fingerprint-like tactile sensors

Fine Segmentation of Chinese Character Strokes Based on Coordinate Awareness and Enhanced BiFPN.

Terminal strip detection and recognition based on improved YOLOv7-tiny and MAH-CRNN+CTC models

Study On Advanced Vehicle Number Plate Detection Systems-A Review

Advanced approach for Moroccan administrative documents digitization using pre-trained models CNN-based: character recognition

The Chinese lexicon of deaf readers: A database of character decisions and a comparison between deaf and hearing readers.

Improving accuracy and explainability of online handwritten character recognition

Build a Trained Data of Tesseract OCR engine for Tifinagh Script Recognition

License plate Chinese character recognition based on ViT model

Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts

Pixel-Level Degradation for Text Image Super-Resolution and Recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Character Recognition Accuracy Research Articles

Related Topics

Articles published on Character Recognition Accuracy

V. Hrinkov, G. Hrinkova, S. Hrinkov. Analysis of modern optical character recognition tools for character recognition and text from the image

Klasifikasi Aksara Sasak Menggunakan Convolutional Neural Networks (CNN)

Self-distillation with beta label smoothing-based cross-subject transfer learning for P300 classification

Handwritten Text Recognition: A Survey of OCR Techniques

Stretchable, conductive Hydrogel-Based triboelectric nanogenerator integrated with deep learning algorithm for character recognition

Deep learning-based digitization of Kurdish text handwritten in the e-government system

Hetero-associative Memory Based New Iraqi License Plate Recognition

FINet: Handwriting trajectory reconstruction of Chinese characters based on the font imitate network

Vehicle License Plate Detection and Recognition using OpenCV and Tesseract OCR

Research on efficient braille recognition based on the pair-U-shaped micro-nano optical fiber fingerprint-like tactile sensors

Fine Segmentation of Chinese Character Strokes Based on Coordinate Awareness and Enhanced BiFPN.

Terminal strip detection and recognition based on improved YOLOv7-tiny and MAH-CRNN+CTC models

Study On Advanced Vehicle Number Plate Detection Systems-A Review

Advanced approach for Moroccan administrative documents digitization using pre-trained models CNN-based: character recognition

The Chinese lexicon of deaf readers: A database of character decisions and a comparison between deaf and hearing readers.

Improving accuracy and explainability of online handwritten character recognition

Build a Trained Data of Tesseract OCR engine for Tifinagh Script Recognition

License plate Chinese character recognition based on ViT model

Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts

Pixel-Level Degradation for Text Image Super-Resolution and Recognition