Abstract

Deep Web sources contain a large of high-quality and query-related structured date. One of the challenges in the Deep Web is extracting result schemas of Deep Web sources. To address this challenge, this paper describes a novel approach that extracts both result data and the result schema of a Web database. The approach first models the query interface of a Deep Web source and fills in it with a specifically query instance. Then the result pages of the Deep Web sources are formatted in the tree structure to retrieve subtrees that contain elements of the query instance. Next, result schema of the Deep Web source is extracted by matching the subtree’ nodes with the query instance, in which, a two-phase schema extraction method is adopted for obtaining more accurate result schema. Finally, experiments on real Deep Web sources show the utility of our approach, which provides a high precision and recall.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.