A word extraction algorithm for machine-printed documents using a 3D neighborhood graph model

Hwan-Chul Park,Se-Young Ok,Hwan-Gue Cho,Young-Jung Yu

doi:10.1007/pl00010903

Abstract

Automatic character recognition and image understanding of a given paper document are the main objectives of the computer vision field. For these problems, a basic step is to isolate characters and group words from these isolated characters. In this paper, we propose a new method for extracting characters from a mixed text/graphic machine-printed document and an algorithm for distinguishing words from the isolated characters. For extracting characters, we exploit several features (size, elongation, and density) of characters and propose a characteristic value for classification using the run-length frequency of the image component. In the context of word grouping, previous works have largely been concerned with words which are placed on a horizontal or vertical line. Our word grouping algorithm can group words which are on inclined lines, intersecting lines, and even curved lines. To do this, we introduce the 3D neighborhood graph model which is very useful and efficient for character classification and word grouping. In the 3D neighborhood graph model, each connected component of a text image segment is mapped onto 3D space according to the area of the bounding box and positional information from the document. We conducted tests with more than 20 English documents and more than ten oriental documents scanned from books, brochures, and magazines. Experimental results show that more than 95% of words are successfully extracted from general documents, even in very complicated oriental documents.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A word extraction algorithm for machine-printed documents using a 3D neighborhood graph model

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition

Lead the way for us

Journal: International Journal on Document Analysis and Recognition	Publication Date: Dec 1, 2001
Citations: 19

Similar Papers

The optimal pathway for S1 sacroiliac screwing: 3D imaging study and clinical application
...
Chinese Journal of Orthopaedic Trauma | VOL. 21
, et. al. ...
15 Feb 2019
Chinese Journal of Orthopaedic Trauma | VOL. 21

Cumulative Sum Curves and Their Prediction Limits
Gary L Grunkemeier ... Yingxing Wu
The Annals of Thoracic Surgery | VOL. 87
Gary L Grunkemeier, et. al.Gary L Grunkemeier ... Yingxing Wu
19 Jan 2009
The Annals of Thoracic Surgery | VOL. 87

Horizontal Lines and Vertical Lines in Science and Art
Giorgio Careri
Leonardo | VOL. 16
Giorgio CareriGiorgio Careri
01 Jan 1982
Leonardo | VOL. 16

Left of centre: asymmetries for the horizontal vertical line illusion
Elisha K Josev ... Michael E R Nicholls
Psychological Research | VOL. 75
Elisha K Josev, et. al.Elisha K Josev ... Michael E R Nicholls
25 Nov 2010
Psychological Research | VOL. 75

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A word extraction algorithm for machine-printed documents using a 3D neighborhood graph model

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition