Methods and algorithms for protecting information in optical text recognition systems

Konstantin Dergachov,Anatoly Zymovin,Leonid Krasnov,Vladislav Bilozerskyi

doi:10.32620/reks.2022.1.12

Abstract

The subject of the study. A concept of OCR systems performance improvement is proposed, which is achieved through the integrated use of special algorithms for preliminary processing of documents picture, an extended set of service functions, and advanced techniques for information protection. Study objectives: development of algorithms that compensate for the influence of the unfavorable points like imperfect lighting conditions overshooting, images geometric deformation, noises, etc., which corrupt the pictured text, on the efficiency of that text recognition. It is needed to provide for a series of service procedures that would ensure adequate data handling while viewing, converting, and storing in standard formats the results, and ensuring the possibility to exchange data in open communication networks. Additionally, it is necessary to ensure the information protection against unauthorized use at the stage of data processing and provide secretiveness of their transmission through the communication channels. Investigation methods and research results: developed and tested algorithms for preliminary picture data processing, namely, for the captured image geometry transformation, picture noise correction with different filters, image binarization when using the adaptive thresholds reduced the negative influence of irregular image portions illumination; in the software, the special services ensure the data processing ease and information protection are affected. In particular, the interactive procedure for text segmentation is proposed, which implies the possibility of anonymizing its fragments and contributes to collecting confidentiality for documents treated. The package for processing document shots contains the face detection algorithm bringing the identification of such information features; it can be used further in the task of face recognition. After the textual doc is recognized, the received data encryption is provided by generating a QR-code and the steganography methods can deliver the privacy of this information transmission. The algorithms' structures are described in detail and the stability of their work under various conditions is investigated. Focused on the case study, docs' text recognition software was developed with the usage of Tesseract version 4.0 optical character recognition program. The program named "HQ Scanner" is written in Python using the present resources of the OpenCV library. An original technique for evaluating the efficiency of algorithms using the criterion of the maximum probability of correct text recognition is implemented in the software. Conclusions. The study results can constitute the basis for developing advanced specialized software for easy-to-use commercial OCR systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Methods and algorithms for protecting information in optical text recognition systems

Abstract

Talk to us

Similar Papers

More From: RADIOELECTRONIC AND COMPUTER SYSTEMS

Lead the way for us

Journal: RADIOELECTRONIC AND COMPUTER SYSTEMS	Publication Date: Feb 23, 2022
License type: cc-by-nc

Similar Papers

Use of Semantic Segmentation for Increasing the Throughput of Digitisation Workflows for Natural History Collections
Abraham Nieva De La Hidalga ... Xianfang Sun
Biodiversity Information Science and Standards | VOL. 3
Abraham Nieva De La Hidalga, et. al.Abraham Nieva De La Hidalga ... Xianfang Sun
18 Jun 2019
Biodiversity Information Science and Standards | VOL. 3

Probabilistic management of OCR data using an RDBMS
Arun Kumar ... Christopher Ré
Proceedings of the VLDB Endowment | VOL. 5
Arun Kumar, et. al.Arun Kumar ... Christopher Ré
01 Dec 2011
Proceedings of the VLDB Endowment | VOL. 5

Simple and Efficient Method for Region of Interest Value Extraction from Picture Archiving and Communication System Viewer with Optical Character Recognition Software and Macro Program
Young Han Lee ... Jin-Suck Suh
Academic Radiology | VOL. 22
Young Han Lee, et. al.Young Han Lee ... Jin-Suck Suh
12 Aug 2014
Academic Radiology | VOL. 22

Implementation of OCR (Optical Character Recognition) Using Tesseract in Detecting Character in Quotes Text Images
Ikha Novie Tri Lestari ... Dadang Iskandar Mulyana
Journal of Applied Engineering and Technological Science (JAETS) | VOL. 4
Ikha Novie Tri Lestari, et. al.Ikha Novie Tri Lestari ... Dadang Iskandar Mulyana
02 Sep 2022
Journal of Applied Engineering and Technological Science (JAETS) | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Methods and algorithms for protecting information in optical text recognition systems

Abstract

Talk to us

Similar Papers

More From: RADIOELECTRONIC AND COMPUTER SYSTEMS