An unsupervised method for joint information extraction and feature mining across different Web sites

Tak-Lam Wong,Wai Lam

doi:10.1016/j.datak.2008.08.009

Abstract

We develop an unsupervised learning framework which can jointly extract information and conduct feature mining from a set of Web pages across different sites. One characteristic of our model is that it allows tight interactions between the tasks of information extraction and feature mining. Decisions for both tasks can be made in a coherent manner leading to solutions which satisfy both tasks and eliminate potential conflicts at the same time. Our approach is based on an undirected graphical model which can model the interdependence between the text fragments within the same Web page, as well as text fragments in different Web pages. Web pages across different sites are considered simultaneously and hence information from different sources can be effectively leveraged. An approximate learning algorithm is developed to conduct inference over the graphical model to tackle the information extraction and feature mining tasks. We demonstrate the efficacy of our framework by applying it to two applications, namely, important product feature mining from vendor sites, and hot item feature mining from auction sites. Extensive experiments on real-world data have been conducted to demonstrate the effectiveness of our framework.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An unsupervised method for joint information extraction and feature mining across different Web sites

Abstract

Talk to us

Similar Papers

More From: Data & Knowledge Engineering

Lead the way for us

Journal: Data & Knowledge Engineering	Publication Date: Sep 19, 2008
Citations: 75

Similar Papers

Collaborative Information Extraction and Mining from Multiple Web Documents
Tak-Lam Wong ... Shing-Kit Chan
-
Tak-Lam Wong, et. al.Tak-Lam Wong ... Shing-Kit Chan
20 Apr 2006
20 Apr 2006

Automated Semantic Analysis of Schematic Data
Saikat Mukherjee ... I V Ramakrishnan
World Wide Web | VOL. 11
Saikat Mukherjee, et. al.Saikat Mukherjee ... I V Ramakrishnan
13 Jun 2008
World Wide Web | VOL. 11

Multilingual document mining and navigation using self-organizing maps
Hsin-Chang Yang ... Chung-Hong Lee
Information Processing and Management | VOL. 47
Hsin-Chang Yang, et. al.Hsin-Chang Yang ... Chung-Hong Lee
08 Jan 2010
Information Processing and Management | VOL. 47

Improving information extraction from visually rich documents using visual span representations
Ritesh Sarkhel ... Arnab Nandi
Proceedings of the VLDB Endowment | VOL. 14
Ritesh Sarkhel, et. al.Ritesh Sarkhel ... Arnab Nandi
01 Jan 2020
Proceedings of the VLDB Endowment | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An unsupervised method for joint information extraction and feature mining across different Web sites

Abstract

Talk to us

Similar Papers

More From: Data & Knowledge Engineering