FiVaTech: Page-Level Web Data Extraction from Template Pages

Mohammed Kayed,Chia-Hui Chang

doi:10.1109/tkde.2009.82

FiVaTech: Page-Level Web Data Extraction from Template Pages

Mohammed Kayed, Chia-Hui Chang

https://doi.org/10.1109/tkde.2009.82

Copy DOI

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Feb 1, 2010
Citations: 127

Affiliation: Beni-Suef University, National Central University

#Web Data Extraction #Tree Templates + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

Web data extraction has been an important part for many Web data analysis applications. In this paper, we formulate the data extraction problem as the decoding process of page generation based on structured data and tree templates. We propose an unsupervised, page-level data extraction approach to deduce the schema and templates for each individual deep Website, which contains either singleton or multiple data records in one Webpage. FiVaTech applies tree matching, tree alignment, and mining techniques to achieve the challenging task. In experiments, FiVaTech has much higher precision than EXALG and is comparable with other record-level extraction systems like ViPER and MSE. The experiments show an encouraging result for the test pages used in many state-of-the-art Web data extraction works.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: IEEE Transactions on Knowledge and Data Engineering

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.