Hybrid model for Chinese character recognition based on Tesseract-OCR

Yi Wei Ma,Hong Tao Hu,Bo Wang

doi:10.1504/ijipt.2020.10027242

Abstract

Optical character recognition (OCR) is an important way to input information into a computer. And text information can be extracted by OCR from an image. Currently, the accuracy rate of Chinese OCR can also be improved. This study proposes a hybrid Chinese character recognition model based on the characteristics of Chinese. Before the OCR engine works, the model first filters the interference information in the image. Then the model adjusts the aspect ratio of the character. After an image is identified by OCR, single character recognition result is obtained. Then the result is checked and corrected on the phrase level. The experimental results show that the hybrid model improves the accuracy rate of Chinese OCR. Through image processing, the correct rate of recognition by the Tesseract-OCR engine is increased by about 12%, and the natural language processing improves the accuracy of the recognition result by about 5%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hybrid model for Chinese character recognition based on Tesseract-OCR

Abstract

Talk to us

Similar Papers

More From: International Journal of Internet Protocol Technology

Lead the way for us

Similar Papers

A new cognitive model for Chinese character description and recognition
Jian-Qin Liu ... Wei Li
-
Jian-Qin Liu, et. al. Jian-Qin Liu ... Wei Li
05 Dec 1994
05 Dec 1994

Application of Geometry Rectification to Deformed Characters Recognition
Honghui Fan ... Liqun Wang
-
Honghui Fan, et. al.Honghui Fan ... Liqun Wang
01 Jan 2015
01 Jan 2015

An improved method on Chinese character recognition
Ke-Jian Wang ... Yan Zhao
-
Ke-Jian Wang, et. al. Ke-Jian Wang ... Yan Zhao
02 Nov 2003
02 Nov 2003

Enhanced ResNet-151-based fused features for optimized Bi-LSTM-DNN-aided handwritten character and digits recognition
Srinivasa Rao N ... Nelson Kennedy Babu C
Expert Systems with Applications | VOL. 244
Srinivasa Rao N, et. al.Srinivasa Rao N ... Nelson Kennedy Babu C
08 Dec 2023
Expert Systems with Applications | VOL. 244

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid model for Chinese character recognition based on Tesseract-OCR

Abstract

Talk to us

Similar Papers

More From: International Journal of Internet Protocol Technology