I-ViDE: An Improved Vision-Based Approach for Deep Web Data Extraction

Mrudula Varade,Vimla Jethani

doi:10.9790/0661-16440922

Abstract

Deep Web contents are accessed by queries submitted to Web databases and the returned data records are enwrapped in dynamically generated Web pages (they will be called deep Web pages in this paper). Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. Until now, a large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are HTML language dependent .Visual features are not taken into consideration. All previous methods are mostly dependent on table tags. A Vision based approach for web data extraction has overcome the limitations of previous work by utilizing some interesting common visual features on the web page. But still this approach has one drawback that it can process web page containing only one data region. Due to processing of one data region it reduces the precision and recall rate. As precision give us the rate that how many correct data records are extracted from relevant data records and recall give us the rate that how many relevant data records are extracted from overall data records. The proposed ImprovedViDE approach handles multi data-region in deep web pages which can improve the precision rate and recall rate.

Full Text