Script-Independent Text Segmentation from Document Images

Parul Sahare,Mayur R Parate,Tausif Diwan,Sanjay B Dhok,Jitendra V Tembhurne

doi:10.4018/ijaci.313967

Abstract

Document image analysis finds broad application in the digital world for the purpose of information retrieval. This includes optical character recognition (OCR), indexing of digital libraries, web image processing, etc. One of the important steps in this field is text segmentation. This segmentation becomes complicated for the documents containing text of uneven spacing and characters of varying font sizes. In this paper, script-independent text-line segmentation and word segmentation algorithms are presented. Fast marching method is used for text-line segmentation, whereas wavelet transform with connected components (CCs) labeling is used for word segmentation. Fast marching method is used as a region growing process that detects potential text-lines. For word segmentation, energy map is calculated using wavelet transform to create text-blocks. Both the proposed algorithms are evaluated on different databases containing documents of different scripts, where highest text-line and word segmentation accuracies of 98.9% and 99.1%, respectively, are obtained.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Script-Independent Text Segmentation from Document Images

Abstract

Talk to us

Similar Papers

More From: International Journal of Ambient Computing and Intelligence

Lead the way for us

Journal: International Journal of Ambient Computing and Intelligence	Publication Date: Nov 17, 2022
Citations: 1

Similar Papers

Line and Word Segmentation of handwritten text documents written in Gurmukhi Script using mid point detection technique
Payal Jindal ... Balkrishan Jindal
-
Payal Jindal, et. al.Payal Jindal ... Balkrishan Jindal
01 Dec 2015
01 Dec 2015

Fast Methods for Eikonal Equations: An Experimental Survey
J. V. Gomez ... D. Alvarez
IEEE Access | VOL. 7
J. V. Gomez, et. al.J. V. Gomez ... D. Alvarez
01 Jan 2019
IEEE Access | VOL. 7

A Hybrid Method for Text Line Extraction in Handwritten Document Images
Ehsan Kiumarsi ... Alireza Alaei
-
Ehsan Kiumarsi, et. al.Ehsan Kiumarsi ... Alireza Alaei
01 Aug 2018
01 Aug 2018

Two Fast Marching Methods for Hamilton–Jacobi Equations
Emiliano Cristiani ... Maurizio Falcone
PAMM | VOL. 7
Emiliano Cristiani, et. al.Emiliano Cristiani ... Maurizio Falcone
01 Dec 2007
PAMM | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Script-Independent Text Segmentation from Document Images

Abstract

Talk to us

Similar Papers

More From: International Journal of Ambient Computing and Intelligence