Abstract

Extracting data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. A large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language-dependent. The contents on Web pages are always displayed regularly for users to browse. There is different ways for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper vision-based approach is Web page programming-language-independent approach is proposed. This approach utilizes the visual features of the web pages to extract data from deep web pages including data record extraction and data item extraction. Again we also propose a new evaluation measure revision to capture human effort needed to produce exact extraction of data. Our implementation on large set of web databases describes the proposed vision-based approach is highly effective for data extraction from deep web pages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call