IMPROVED OCR QUALITY FOR SMART SCANNED DOCUMENT MANAGEMENT SYSTEM

Viet Anh Pham

doi:10.56651/lqdtu.jst.v9.n01.60.ict

Abstract

The quality of the document images is a crucial factor for the performance of an Optical Character Recognition (OCR) model. Various issues from the input data hinder the recognition success such as heterogeneous layouts, skewness and proportional fonts. This paper investigated several algorithms for data pre-processing including image deskewing, table and document layout analysis to improve the accuracy of the OCR model and then built an end-to-end scanned document management system. We verified the algorithms using a well-known OCR software namely Tesseract. The experiments on a real dataset shown that our methods can accurately process document images with arbitrary angles of rotation, and different layouts. As a result, the accuracy by words of Tesseract can boost 23% for documents with complex structures. The quality of the output text allows to build a system to store and search documents efficiently. Index

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

IMPROVED OCR QUALITY FOR SMART SCANNED DOCUMENT MANAGEMENT SYSTEM

Abstract

Talk to us

Similar Papers

More From: Journal of Science and Technique

Lead the way for us

Similar Papers

Smart Parking System based on Improved OCR Model
Rami Bassam ... Fars Samann
IOP Conference Series: Materials Science and Engineering | VOL. 978
Rami Bassam, et. al.Rami Bassam ... Fars Samann
01 Nov 2020
IOP Conference Series: Materials Science and Engineering | VOL. 978

A Smart Visual Sensing Concept Involving Deep Learning for a Robust Optical Character Recognition under Hard Real-World Conditions.
Kabeh Mohsenzadegan ... Vahid Tavakkoli
Sensors (Basel, Switzerland) | VOL. 22
Kabeh Mohsenzadegan, et. al.Kabeh Mohsenzadegan ... Vahid Tavakkoli
12 Aug 2022
Sensors (Basel, Switzerland) | VOL. 22

Towards a Higher Accuracy of Optical Character Recognition of Chinese Rare Books in Making Use of Text Model
Hsiang-An Wang ... Pin-Ting Liu
-
Hsiang-An Wang, et. al.Hsiang-An Wang ... Pin-Ting Liu
08 May 2019
08 May 2019

Unsupervised OCR Model Evaluation Using GAN
Abhash Sinha ... Syed Saqib Bukhari
-
Abhash Sinha, et. al.Abhash Sinha ... Syed Saqib Bukhari
01 Sep 2019
01 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

IMPROVED OCR QUALITY FOR SMART SCANNED DOCUMENT MANAGEMENT SYSTEM

Abstract

Talk to us

Similar Papers

More From: Journal of Science and Technique