NET – A System for Extracting Web Data from Flat and Nested Data Records

Bing Liu,Yanhong Zhai

doi:10.1007/11581062_39

NET – A System for Extracting Web Data from Flat and Nested Data Records

Bing Liu, Yanhong Zhai

Open Access

https://doi.org/10.1007/11581062_39

Copy DOI

Publication Date: Jan 1, 2005

Citations: 83

Affiliation: University of Illinois at Chicago

#Automatic Extraction Of Data #Data Items + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

This paper studies automatic extraction of structured data from Web pages. Each of such pages may contain several groups of structured data records. Existing automatic methods still have several limitations. In this paper, we propose a more effective method for the task. Given a page, our method first builds a tag tree based on visual information. It then performs a post-order traversal of the tree and matches subtrees in the process using a tree edit distance method and visual cues. After the process ends, data records are found and data items in them are aligned and extracted. The method can extract data from both flat and nested data records. Experimental evaluation shows that the method performs the extraction task accurately.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.