Exploiting the Information Web

Dawn G. Gregg,Steven Walczak

doi:10.1109/tsmcc.2006.876061

Abstract

The World Wide Web is an increasingly important data source for business decision making; however, extracting information from the Web remains one of the challenging issues related to Web business intelligence applications. To use heterogeneous Web data for decision making, documents containing relevant data must be located, and the data of interest within the documents must be identified and extracted. Currently, most automatic information extraction systems can only cope with a limited set of document formats or do not adapt well to changes in document structure, as a result, many real-world data sources with complex document structures cannot be consistently interpreted using a single information extraction system. This paper presents an adaptive information extraction system prototype that combines multiple information extraction approaches to allow more accurate and resilient data extraction for a wide variety of Web sources. The Amorphic Web information extraction system prototype can locate data of interest based on domain knowledge or page structure, can automatically generate a wrapper for a data source, and can detect when the structure of a Web-based resource has changed and act on this to search the updated resource to locate the desired data. The prototype Amorphic information extraction system demonstrated improved information extraction accuracy for the four different extraction scenarios examined when compared with traditional data extraction approaches

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting the Information Web

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews)

Lead the way for us

Journal: IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews)	Publication Date: Jan 1, 2007
Citations: 52

Similar Papers

A Survey of Web Information Extraction Systems
Chia-Hui Chang ... M Kayed
IEEE Transactions on Knowledge and Data Engineering | VOL. 18
Chia-Hui Chang, et. al. Chia-Hui Chang ... M Kayed
01 Oct 2006
IEEE Transactions on Knowledge and Data Engineering | VOL. 18

Use of a Fast Information Extraction Method as a Decision Support Tool
Mahmudul Sheikh ... Sumali Conlon
Journal of International Technology and Information Management | VOL. 19
Mahmudul Sheikh, et. al.Mahmudul Sheikh ... Sumali Conlon
01 Jan 2009
Journal of International Technology and Information Management | VOL. 19

FVI-BD: Multiple File Extraction using Fusion Vector Investigation (FVI) in Big Data Hadoop Environment
V Vadivu ... N Kavitha
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11
V Vadivu, et. al.V Vadivu ... N Kavitha
13 Jul 2023
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11

Inducing information extraction systems for new languages via cross-language projection
Ellen Riloff ... Charles Schafer
-
Ellen Riloff, et. al.Ellen Riloff ... Charles Schafer
01 Jan 2002
01 Jan 2002

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting the Information Web

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews)