Thinning Chinese, Korean, Japanese and Thai script for segmentation-free OCRs

Abdul Majid Abdul Majid,Dil Nawaz Hakro Dil Nawaz Hakro,Saba Brahmani Saba Brahmani,Qinbo Qinbo

doi:10.32628/cseit2410111

Abstract

While searching on the internet, the OCR keyword will return a thousand research papers on optical character recognition. These papers are ranging from Latin language scripts, Cyrillic, Devanagari, Korean, Japanese, Chinese and Arabic scripts. Sindhi and many other languages extend the Arabic script in which base characters are same while the other characters are adopted in a same situation. Many of the languages possess OCRs for their languages but still there are some other languages which still require the OCRs for their language. The paper is organized in various sections such as introduction followed by Sindhi language characteristics. The OCR approaches and methods are explained. The last section describes the conclusion and future work. An OCR is a set of complex steps to convert image text to editable text. Skeletonization or shrining a word or character body is a method which helps to recognize text more easily. Multiple languages impose various challenges and are hard to recognize and skeletonization or thinning produces a new image which can be easy to recognize. The connected elements are found with this approach. A custom-built software has been developed to interface the generalized thinning algorithm so that the scripts of Chinese, Japanese, Korean and Thai be tested. The output of this algorithm is the final image to be used for the further processing of the OCR. Although the intention was to create algorithms for segmentation free OCRs, the study results and the software can also be used for segmentation-based algorithms. The generalized algorithm shows the accuracy of more than 95% for the experimented four scripts.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Thinning Chinese, Korean, Japanese and Thai script for segmentation-free OCRs

Abstract

Talk to us

Similar Papers

More From: International Journal of Scientific Research in Computer Science, Engineering and Information Technology

Lead the way for us

Journal: International Journal of Scientific Research in Computer Science, Engineering and Information Technology	Publication Date: Jan 1, 2024
License type: cc-by

Similar Papers

A Robust OCR for Degraded Documents
Kapil Dev Dhingra ... Pramod Kumar Sharma
-
Kapil Dev Dhingra, et. al.Kapil Dev Dhingra ... Pramod Kumar Sharma
01 Jan 2008
01 Jan 2008

Tifinagh Character Recognition: A Survey
Youssef Ouadid ... Brahim Minaoui
-
Youssef Ouadid, et. al.Youssef Ouadid ... Brahim Minaoui
01 Mar 2018
01 Mar 2018

A Proposal of Printed Table Digitization Algorithm with Image Processing
Yuanzhi Huo ... Takashi Toshida
Algorithms | VOL. 15
Yuanzhi Huo, et. al.Yuanzhi Huo ... Takashi Toshida
11 Dec 2022
Algorithms | VOL. 15

A segmentation-free approach to Arabic and Urdu OCR
Richard Zanibbi ... Nazly Sabbour
-
Richard Zanibbi, et. al.Richard Zanibbi ... Nazly Sabbour
04 Feb 2013
04 Feb 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Thinning Chinese, Korean, Japanese and Thai script for segmentation-free OCRs

Abstract

Talk to us

Similar Papers

More From: International Journal of Scientific Research in Computer Science, Engineering and Information Technology