Transformative Progress in Document Digitization: An In-Depth Exploration of Machine and Deep Learning Models for Character Recognition

Ali Benaissa,Abdelkhalak Bahri,My Abdelouahab Salahddine,Ahmad El Allaoui

doi:10.56294/dm2023174

Abstract

Introduction: this paper explores the effectiveness of character recognition models for document digitization, leveraging diverse machine learning and deep learning techniques. The study, driven by the increasing relevance of image classification in various applications, focuses on evaluating Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and VGG16 with transfer learning. The research employs a challenging French alphabet dataset, comprising 82 classes, to assess the models' capacity to discern intricate patterns and generalize across diverse characters. Objective: This study investigates the effectiveness of character recognition models for document digitization using diverse machine learning and deep learning techniques. Methods: the methodology initiates with data preparation, involving the creation of a merged dataset from distinct sections, encompassing digits, French special characters, symbols, and the French alphabet. The dataset is subsequently partitioned into training, test, and evaluation sets. Each model undergoes meticulous training and evaluation over a specific number of epochs. The recording of fundamental metrics includes accuracy, precision, recall, and F1-score for CNN, RNN, and VGG16, while SVM and KNN are evaluated based on accuracy, macro avg, and weighted avg. Results: the outcomes highlight distinct strengths and areas for improvement across the evaluated models. SVM demonstrates remarkable accuracy of 98.63%, emphasizing its efficacy in character recognition. KNN exhibits high reliability with an overall accuracy of 97%, while the RNN model faces challenges in training and generalization. The CNN model excels with an accuracy of 97.268%, and VGG16 with transfer learning achieves notable enhancements, reaching accuracy rates of 94.83% on test images and 94.55% on evaluation images. Conclusion: our study evaluates the performance of five models—Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and VGG16 with transfer learning—on character recognition tasks. SVM and KNN demonstrate high accuracy, while RNN faces challenges in training. CNN excels in image classification, and VGG16, with transfer learning, enhances accuracy significantly. This comparative analysis aids in informed model selection for character recognition applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Transformative Progress in Document Digitization: An In-Depth Exploration of Machine and Deep Learning Models for Character Recognition

Abstract

Talk to us

Similar Papers

More From: Data and Metadata

Lead the way for us

Journal: Data and Metadata	Publication Date: Dec 27, 2023
License type: CC BY 4.0

Similar Papers

A comprehensive review of external quality measurements of fruits and vegetables using nondestructive sensing technologies
Tanjima Akter ... Byoung-Kwan Cho
Journal of Agriculture and Food Research | VOL. 15
Tanjima Akter, et. al.Tanjima Akter ... Byoung-Kwan Cho
23 Feb 2024
Journal of Agriculture and Food Research | VOL. 15

Comprehensive Study for Breast Cancer Using Deep Learning and Traditional Machine Learning
-
ZANCO JOURNAL OF PURE AND APPLIED SCIENCES | VOL. 34
--
12 Apr 2022
ZANCO JOURNAL OF PURE AND APPLIED SCIENCES | VOL. 34

COVID‐19: A systematic review of prediction and classification techniques
Om Ramakisan Varma ... Mala Kalra
International Journal of Imaging Systems and Technology | VOL. 33
Om Ramakisan Varma, et. al.Om Ramakisan Varma ... Mala Kalra
11 May 2023
International Journal of Imaging Systems and Technology | VOL. 33

Handwritten Urdu character recognition via images using different machine learning and deep learning techniques
M Ameen Chhajro
Indian Journal of Science and Technology | VOL. 13
M Ameen ChhajroM Ameen Chhajro
08 May 2020
Indian Journal of Science and Technology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transformative Progress in Document Digitization: An In-Depth Exploration of Machine and Deep Learning Models for Character Recognition

Abstract

Talk to us

Similar Papers

More From: Data and Metadata