ViDE: A Vision-Based Approach for Deep Web Data Extraction

Wei Liu,Weiyi Meng,Xiaofeng Meng

doi:10.1109/tkde.2009.109

Abstract

Deep Web contents are accessed by queries submitted to Web databases and the returned data records are enwrapped in dynamically generated Web pages (they will be called deep Web pages in this paper). Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. Until now, a large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language-dependent. As the popular two-dimensional media, the contents on Web pages are always displayed regularly for users to browse. This motivates us to seek a different way for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper, a novel vision-based approach that is Web-page-programming-language-independent is proposed. This approach primarily utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction. We also propose a new evaluation measure revision to capture the amount of human effort needed to produce perfect extraction. Our experiments on a large set of Web databases show that the proposed vision-based approach is highly effective for deep Web data extraction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ViDE: A Vision-Based Approach for Deep Web Data Extraction

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Mar 1, 2010
Citations: 273

Similar Papers

Visual Architecture based Web Information Extraction
Oswalt Manoj S
Bonfring International Journal of Data Mining | VOL. 1
Oswalt Manoj SOswalt Manoj S
30 Dec 2011
Bonfring International Journal of Data Mining | VOL. 1

I-ViDE: An Improved Vision-Based Approach for Deep Web Data Extraction
Mrudula Varade ... Vimla Jethani
IOSR Journal of Computer Engineering | VOL. 16
Mrudula Varade, et. al.Mrudula Varade ... Vimla Jethani
01 Jan 2014
IOSR Journal of Computer Engineering | VOL. 16

Using Visual Clues Concept for Extracting Main Data from Deep Web Pages
Satish J Pusdekar ... Shaikh Phiroj Chhaware
-
Satish J Pusdekar, et. al.Satish J Pusdekar ... Shaikh Phiroj Chhaware
01 Jan 2014
01 Jan 2014

DWDE-IR: An Efficient Deep Web Data Extraction for Information Retrieval on Web Mining
Aysha Banu ... M Chitra
Journal of Emerging Technologies in Web Intelligence | VOL. 6
Aysha Banu, et. al.Aysha Banu ... M Chitra
01 Feb 2014
Journal of Emerging Technologies in Web Intelligence | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ViDE: A Vision-Based Approach for Deep Web Data Extraction

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering