Information Extraction from the Web: System and Techniques

Luo Xiao,Stephan Jablonski,Dieter Wissmann,Michael Brown

doi:10.1023/b:apin.0000033637.51909.04

Abstract

Information Extraction (IE) systems that can exploit the vast source of textual information that is the internet would provide a revolutionary step forward in terms of delivering large volumes of content cheaply and precisely, thus enabling a wide range of new knowledge driven applications and services. However, despite this enormous potential, few IE systems have successfully made the transition from laboratory to commercial application. The reason may be a purely practical one—to build useable, scaleable IE systems requires bringing together a range of different technologies as well as providing clear and reproducible guidelines as to how to collectively configure and deploy those technologies. This paper is an attempt to address these issues. The paper focuses on two primary goals. Firstly, we show that an information extraction system which is used for real world applications and different domains can be built using some autonomous, corporate components (agents). Such a system has some advanced properties: clear separation to different extraction tasks and steps, portability to multiple application domain, trainability, extensibility, etc. Secondly, we show that machine learning and, in particular, learning in different ways and at different levels, can be used to build practical IE systems. We show that carefully selecting the right machine learning technique for the right task and selective sampling can be used to reduce the human effort required to annotate examples for building such systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Information Extraction from the Web: System and Techniques

Abstract

Talk to us

Similar Papers

More From: Applied Intelligence

Lead the way for us

Journal: Applied Intelligence	Publication Date: Sep 1, 2004
Citations: 49

Similar Papers

PDF text classification to leverage information extraction from publication reports
Duy Duc An Bui ... Siddhartha Jonnalagadda
Journal of Biomedical Informatics | VOL. 61
Duy Duc An Bui, et. al.Duy Duc An Bui ... Siddhartha Jonnalagadda
01 Apr 2016
Journal of Biomedical Informatics | VOL. 61

Inducing information extraction systems for new languages via cross-language projection
Ellen Riloff ... David Yarowsky
-
Ellen Riloff, et. al.Ellen Riloff ... David Yarowsky
01 Jan 2002
01 Jan 2002

Use of a Fast Information Extraction Method as a Decision Support Tool
Mahmudul Sheikh ... Sumali Conlon
Journal of International Technology and Information Management | VOL. 19
Mahmudul Sheikh, et. al.Mahmudul Sheikh ... Sumali Conlon
01 Jan 2009
Journal of International Technology and Information Management | VOL. 19

Join Optimization of Information Extraction Output: Quality Matters!
Alpa Jain ... Luis Gravano
-
Alpa Jain, et. al.Alpa Jain ... Luis Gravano
01 Mar 2009
01 Mar 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Information Extraction from the Web: System and Techniques

Abstract

Talk to us

Similar Papers

More From: Applied Intelligence