A Unified Algorithm for Identification of Various Tabular Structures from Document Images

Sekhar Mandal,Amit K Das,Bhabatosh Chanda,Partha Bhowmick

doi:10.4018/jdls.2011040103

Abstract

This paper presents a unified algorithm for segmentation and identification of various tabular structures from document page images. Such tabular structures include conventional tables and displayed math-zones, as well as Table of Contents TOC and Index pages. After analyzing the page composition, the algorithm initially classifies the input set of document pages into tabular and non-tabular pages. A tabular page contains at least one of the tabular structures, whereas a non-tabular page does not contain any. The approach is unified in the sense that it is able to identify all tabular structures from a tabular page, which leads to a considerable simplification of document image segmentation in a novel manner. Such unification also results in speeding up the segmentation process, because the existing methodologies produce time-consuming solutions for treating different tabular structures as separate physical entities. Distinguishing features of different kinds of tabular structures have been used in stages in order to ensure the simplicity and efficiency of the algorithm and demonstrated by exhaustive experimental results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Unified Algorithm for Identification of Various Tabular Structures from Document Images

Abstract

Talk to us

Similar Papers

More From: International Journal of Digital Library Systems

Lead the way for us

Journal: International Journal of Digital Library Systems	Publication Date: Jan 1, 2011
Citations: 1

Similar Papers

A Complete System for Detection and Identification of Tabular Structures from Document Images
S Mandal ... A K Das
-
S Mandal, et. al.S Mandal ... A K Das
01 Jan 2004
01 Jan 2004

Current Status and Performance Analysis of Table Recognition in Document Images With Deep Neural Networks
Khurram Azeem Hashmi ... Marcus Liwicki
IEEE Access | VOL. 9
Khurram Azeem Hashmi, et. al.Khurram Azeem Hashmi ... Marcus Liwicki
01 Jan 2020
IEEE Access | VOL. 9

Holistic design for deep learning-based discovery of tabular structures in datasheet images
Ertugrul Kara ... Shahzad Khan
Engineering Applications of Artificial Intelligence | VOL. 90
Ertugrul Kara, et. al.Ertugrul Kara ... Shahzad Khan
15 Feb 2020
Engineering Applications of Artificial Intelligence | VOL. 90

A Model Based Framework for Table Processing in Degraded Document Images
Zhixin Shi ... Venu Govindaraju
-
Zhixin Shi, et. al.Zhixin Shi ... Venu Govindaraju
01 Aug 2013
01 Aug 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Unified Algorithm for Identification of Various Tabular Structures from Document Images

Abstract

Talk to us

Similar Papers

More From: International Journal of Digital Library Systems