Semisupervised Wrapper Choice and Generation for Print-Oriented Documents

Alberto Bartoli,Giorgio Davanzo,Enrico Sorio,Eric Medvet

doi:10.1109/tkde.2012.254

Abstract

Information extraction from printed documents is still a crucial problem in many interorganizational workflows. Solutions for other application domains, for example, the web, do not fit this peculiar scenario well, as printed documents do not carry any explicit structural or syntactical description. Moreover, printed documents usually lack any explicit indication about their source. We present a system, which we call PATO, for extracting predefined items from printed documents in a dynamic multisource scenario. PATO selects the source-specific wrapper required by each document, determines whether no suitable wrapper exists, and generates one when necessary. PATO assumes that the need for new source-specific wrappers is a part of normal system operation: new wrappers are generated online based on a few point-and-click operations performed by a human operator on a GUI. The role of operators is an integral part of the design and PATO may be configured to accommodate a broad range of automation levels. We show that PATO exhibits very good performance on a challenging data set composed of more than 600 printed documents drawn from three different application domains: invoices, datasheets of electronic components, and patents. We also perform an extensive analysis of the crucial tradeoff between accuracy and automation level.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semisupervised Wrapper Choice and Generation for Print-Oriented Documents

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Jan 1, 2014
Citations: 36

Similar Papers

A Taxonomy for Levels of Automation based on the Industrial Revolutions
Giacomo Barbieri ... David Sanchez-Londoño
IFAC PapersOnLine | VOL. 55
Giacomo Barbieri, et. al.Giacomo Barbieri ... David Sanchez-Londoño
01 Jan 2021
IFAC PapersOnLine | VOL. 55

Situations Saved by the Human Operator when Automation Failed
...
Chemical engineering transactions | VOL. 31
, et. al. ...
20 May 2013
Chemical engineering transactions | VOL. 31

Level of automation effects on performance, situation awareness and workload in a dynamic control task
Mica R Endsley ... David B Kaber
Ergonomics | VOL. 42
Mica R Endsley, et. al.Mica R Endsley ... David B Kaber
01 Mar 1999
Ergonomics | VOL. 42

The Combined Effect of Level of Automation and Adaptive Automation on Human Performance with Complex, Dynamic Control Systems
David B Kaber ... Mica R Endsley
Proceedings of the Human Factors and Ergonomics Society Annual Meeting | VOL. 41
David B Kaber, et. al.David B Kaber ... Mica R Endsley
01 Oct 1997
Proceedings of the Human Factors and Ergonomics Society Annual Meeting | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semisupervised Wrapper Choice and Generation for Print-Oriented Documents

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering