Mining templates from search result records of search engines

Hongkun Zhao,Clement Yu,Weiyi Meng

doi:10.1145/1281192.1281286

Abstract

Metasearch engine, Comparison-shopping and Deep Web crawling applications need to extract search result records enwrapped in result pages returned from search engines in response to user queries. The search result records from a given search engine are usually formatted based on a template. Precisely identifying this template can greatly help extract and annotate the data units within each record correctly. In this paper, we propose a graph model to represent record template and develop a domain independent statistical method to automatically mine the record template for any search engine using sample search result records. Our approach can identify both template tags (HTML tags) and template texts (non-tag texts), and it also explicitly addresses the mismatches between the tag structures and the data structures of search result records. Our experimental results indicate that this approach is very effective.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mining templates from search result records of search engines

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Promoting Agriculture Knowledge via Public Web Search Engines: An Experience by an Iranian Librarian in Response to Agricultural Queries
Sedigheh Mohamadesmaeil ... Saeed Ghaffari
COLLNET Journal of Scientometrics and Information Management | VOL. 6
Sedigheh Mohamadesmaeil, et. al.Sedigheh Mohamadesmaeil ... Saeed Ghaffari
01 Dec 2012
COLLNET Journal of Scientometrics and Information Management | VOL. 6

Automatically Mining Result Records from Search Engine Response Pages
D Mundluru ... S Celebi
-
D Mundluru, et. al.D Mundluru ... S Celebi
27 Nov 2005
27 Nov 2005

The Factors Affecting the Performance of Data Fusion Algorithms
Mohammad Othman Nassar ... Ghassan Kanaan
-
Mohammad Othman Nassar, et. al.Mohammad Othman Nassar ... Ghassan Kanaan
01 Jan 2009
01 Jan 2009

Advanced Metasearch Engine Technology
Weiyi Meng ... Clement T. Yu
-
Weiyi Meng, et. al.Weiyi Meng ... Clement T. Yu
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mining templates from search result records of search engines

Abstract

Talk to us

Similar Papers