Learning to extract hierarchical information from semi-structured documents

Wai-Yip Lin,Wai Lam

doi:10.1145/354756.354826

Abstract

Existing wrapper learning methods need varying form of assumptions or information about the document structure. Many of them can only handle documents with simple structures. T o handle a richer set of semi-structured documents and minimize the burden of user, we develop a new method, known as HISER (HIerarchical record Structure and Extraction Rule learning). Our HISER approach employs a tw ostage learning task, namely, hierarc hical record structure learning and extraction rule learning. In hierarc hical record structure learning, we try to automatically generate a representation of hierarchical structure for the records in an information source. In extraction rule learning, extraction rules are induced for each node in the hierarchical record structure. This design can handle missing items, m ulti-valued items, and items in unrestricted order. We also incorporate both syntactic and semantic generalization in the learning process to enrich the expressiveness of the extraction rules.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning to extract hierarchical information from semi-structured documents

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Effects of Initialization on Rule Extraction in Structural Learning
Hiroshi Shiratsuchi ... Kousuke Kumamaru
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 12
Hiroshi Shiratsuchi, et. al.Hiroshi Shiratsuchi ... Kousuke Kumamaru
20 Jan 2008
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 12

Rule Extraction from Data
Takeshi Furuhashi
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 3
Takeshi FuruhashiTakeshi Furuhashi
20 Oct 1999
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 3

Rule Extraction by Structural Learning with an Immediate Critic
Masumi Ishikawa
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 3
Masumi IshikawaMasumi Ishikawa
20 Oct 1999
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 3

Rule Learning and Extraction Using a Hybrid Neural Network: A Case Study on Fault Detection and Diagnosis
Shing Chiang Tan ... Chee Peng Lim
-
Shing Chiang Tan, et. al.Shing Chiang Tan ... Chee Peng Lim
23 Jan 2013
23 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning to extract hierarchical information from semi-structured documents

Abstract

Talk to us

Similar Papers