Building intelligent systems for mining information extraction rules from web pages by using domain knowledge

Keekyoung Seo Keekyoung Seo,Jaeyoung Yang Jaeyoung Yang,Joongmin Choi Joongmin Choi

doi:10.1109/isie.2001.931807

Abstract

Previous research on automatic information extraction experienced difficulties in acquiting and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources. As a result, many real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents a method of building intelligent systems for mining information extraction rules from semi-structured Web pages by using domain knowledge. This system automatically generates a wrapper for each information source and performs information extraction and information integration by applying this wrapper to the corresponding source. Both the domain knowledge and the wrapper are represented by ML documents to increase flexibility and interoperability. By testing our prototype system on several real-estate information sites, we can claim that it creates the correct wrappers for most Web sources and consequently facilitates effective information extraction for heterogeneous information sources.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Building intelligent systems for mining information extraction rules from web pages by using domain knowledge

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Automatic information extraction from semi-structured Web pages by pattern discovery
Chia-Hui Chang ... Shao-Cheng Lui
Decision Support Systems | VOL. 35
Chia-Hui Chang, et. al.Chia-Hui Chang ... Shao-Cheng Lui
28 May 2002
Decision Support Systems | VOL. 35

Utility of Web Content Blocks in Content Extraction
Marek Kowalkiewicz
-
Marek KowalkiewiczMarek Kowalkiewicz
01 Jan 2007
01 Jan 2007

WIEAS: Helping to Discover Web Information Sources and Extract Data from Them
Liyu Li ... Shiwei Tang
-
Liyu Li, et. al.Liyu Li ... Shiwei Tang
01 Jan 2004
01 Jan 2004

Multimedia Design: from tools for skilled designers to intelligent multimedia design systems

-

01 Jan 1998
01 Jan 1998

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building intelligent systems for mining information extraction rules from web pages by using domain knowledge

Abstract

Talk to us

Similar Papers