Abstract

Web data extraction refers to the technology that helps people find wanted information from the Web. We first classify existing data extraction algorithms into two classes: top-down and bottom-up, and then analyze their strengths and weaknesses in terms of extraction accuracy. On the basis of this analysis, we present a hybrid algorithm: bi-direction data extraction (BiDDE for short), which takes the full strengths of both top-down and bottom-up algorithms and yet avoid their weaknesses. The experimental results show that BiDDE has not only higher accuracy than top-down algorithm and bottom-up algorithm, but satisfactory performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call