Offline OCR System for Machine-Printed Turkish Using Template Matching

Ahmed Dena Rafaa,Jan Nordin

doi:10.4028/www.scientific.net/amr.341-342.565

Abstract

One of the most important application these days in Pattern Recognition (PR) is Optical Character recognition (OCR) which is a system used to convert scanned printed or handwritten image files into machine readable and editable format such as text documents. The main motivation behind this study is to build an OCR system for offline machine-printed Turkish characters to convert any image file into a readable and editable format. This OCR system started from preprocessing step to convert the image file into a binary format with less noise to be ready for recognition. The preprocessing step includes digitization, binarization, thresholding, and noise removal. Next, horizontal projection method is used for line detection and word allocation and 8-connected neighbors’ schema is used to extract characters as a set of connected components. Then, the Template matching method is utilized to implement the matching process between the segmented characters and the template set stored in OCR database in order to recognize the text. Unlike other approaches, template matching takes shorter time and does not require sample training but it is not able to recognize some letters with similar shape or combined letters, for this reason, this OCR system combines both the template matching and the size feature of the segmented characters to achieve accurate results. Finally, upon a successful implementation of the OCR, the recognized patterns are displayed in notepad as readable and editable text. The Turkish machine-printed database consists of a list of 630 names of cities in Turkey written by using Arial font with different sizes in uppercase, lowercase and capitalizes the first character for each word. The proposed OCR’s result show that the accuracy of the system is from 96% to 100%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Offline OCR System for Machine-Printed Turkish Using Template Matching

Abstract

Talk to us

Similar Papers

More From: Advanced Materials Research

Lead the way for us

Journal: Advanced Materials Research	Publication Date: Sep 27, 2011
Citations: 4

Similar Papers

JPEG for Arabic Handwritten Character Recognition: Add a Dimension of Application
Abdurazzag Ali ... Salem Ali
-
Abdurazzag Ali, et. al.Abdurazzag Ali ... Salem Ali
01 Oct 2008
01 Oct 2008

<title>Identification and correction of rejection and substitution errors in optical character recognition systems</title>
Glenn S Himes ... Frank A Decosta Iii
-
Glenn S Himes, et. al.Glenn S Himes ... Frank A Decosta Iii
14 Apr 1993
14 Apr 1993

Soft Computing Techniques for Optical Character Recognition Systems
Arindam Chaudhuri ... Pratixa Badelia
-
Arindam Chaudhuri, et. al.Arindam Chaudhuri ... Pratixa Badelia
24 Dec 2016
24 Dec 2016

Determination of optimal features database for OCR of printed Telugu text
C Vasantha Lakshmi ... Sarika Singh
-
C Vasantha Lakshmi, et. al.C Vasantha Lakshmi ... Sarika Singh
01 Dec 2015
01 Dec 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Offline OCR System for Machine-Printed Turkish Using Template Matching

Abstract

Talk to us

Similar Papers

More From: Advanced Materials Research