Abstract

Improving the quality of classification of different documents is a purpose of modeling the optical character recognition. Non-digital documents, such as scanned or photographed documents, are difficult to classify correctly in electronic document management systems. A decision was made to simulate the process of optical character recognition in the regulatory documents of the organization. There have been considered various methods of modeling the process. The structure of departments for the electronic document management system is given. Methods of implementing optical character recognition (OCR) are considered. The stages of the OCR system development are revealed: image processing, segmentation, recognition. The methods of image processing are analyzed. The main processes associated with image processing are disclosed: alignment, blurring, binarization, finding contours, removing extra lines. Comparison of image blur methods is made. Two stages of image binarization are defined: conversion of a color image into a gray image, binarization of a gray image. The Kenny operator is proposed as a second stage of binarization, which is used to detect the boundaries of the image. The last stage of image processing is the process of removing extra lines. Algorithms for dividing text areas into segments are considered. 3 stages of segmentation are identified: string segmentation, word segmentation, character segmentation. A segmentation algorithm is defined based on calculating the average brightness of image pixels to search for different intervals: line spacing, word spacing, character spacing. Available popular online OCR services as well as some popular desktop programs are considered. A connection has been found between an artificial neural network and optical object recognition. To implement the recognition stage, it is proposed to use an artificial neural network.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call