Farsi document image recognition system using word layout signature

Cem Ergün,Sajedeh Norozpour

doi:10.3906/elk-1804-92

Abstract

In this paper, a new representation of Farsi words is proposed to present the keyword spotting problems in Farsi document image retrieval. In this regard, we define a signature for each Farsi word based on the word connected component layout. The mentioned signature is shown as boxes, and then, by sketching vertical and horizontal lines, we construct a grid of each word to provide a new descriptor. One of the advantages of this method is that it can be used for both handwritten and machine-printed texts. Finally, to evaluate the performance of our system in comparison to other methods, a database that contains 19,582 printed Farsi words is examined, and after applying this approach, a recall rate of 98.1% and a precision rate of 94.3% are obtained.

Highlights

Due to the increase in digital libraries and paper documents in offices, their organization and management take significant amounts of time and energy
To search for a keyword in document images, first of all, by optical character recognition (OCR), we have to convert the format of document images from pictorial format to text format, which is translatable by the machine [1], and by the use of the traditional methods of document retrieval, the target word is sought in the text
OCR is frequently used by researchers in this area, it has some disadvantages that cause OCR to be inappropriate in all retrieval cases

Summary

Introduction

Due to the increase in digital libraries and paper documents in offices, their organization and management take significant amounts of time and energy. The upper contours of words are extracted and a picture dictionary of these features is made, and each subword is shown as a combination of contour strokes that includes upper, lower, and middle positions of the baseline As another example, the work proposed in [23] depends on the feature of the shape of printed words in the recognition of Arabic texts written in three different fonts, two of which are synthetic. According to a literature review above and considering the method discussed in [2, 3], in this paper, we propose a new model for machineprinted Farsi text retrieval based on the similarities of layout of components in Farsi words. The remainder of this paper is organized as follows: Section 2 describes our proposed method, Section 3 summarizes the experimental results, and, lastly, Section 4 presents conclusions of this paper

Preprocessing

Experimental results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES	Publication Date: Mar 1, 2019
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Farsi document image recognition system using word layout signature

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES

Lead the way for us

Similar Papers

Separation of Handwritten and Machine-Printed Texts from Noisy Documents Using Contourlet Transform
Parul Sahare ... Sanjay B Dhok
Arabian Journal for Science and Engineering | VOL. 43
Parul Sahare, et. al.Parul Sahare ... Sanjay B Dhok
08 Jun 2018
Arabian Journal for Science and Engineering | VOL. 43

Separation of Machine-Printed and Handwritten Texts in Noisy Documents using Wavelet Transform
Parul Sahare ... Sanjay B Dhok
IETE Technical Review | VOL. 36
Parul Sahare, et. al.Parul Sahare ... Sanjay B Dhok
13 Jun 2018
IETE Technical Review | VOL. 36

TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text
Fady Medhat ... Sardar Jaf
-
Fady Medhat, et. al.Fady Medhat ... Sardar Jaf
01 Dec 2018
01 Dec 2018

BEHAVIORAL CONTRAST AS A FUNCTION OF THE DURATION OF AN IMMEDIATELY PRECEDING PERIOD OF EXTINCTION1
R N Wilton ... R O Clements
Journal of the Experimental Analysis of Behavior | VOL. 16
R N Wilton, et. al.R N Wilton ... R O Clements
01 Nov 1971
Journal of the Experimental Analysis of Behavior | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Farsi document image recognition system using word layout signature

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING &amp; COMPUTER SCIENCES

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES