Abstract

The Web is the largest repository of data available, with over 150 million high-quality tables. Several works have combined efforts to allow queries on these tables, but there are still challenges, like the various different types of structures found on the Web. In this paper, we propose a taxonomy for the tabular structures and formalize the ones used with relational data and show, through an experimental evaluation, that WTClassifier, our supervised framework, classifies Web tables with high accuracy. Additionally, we use WTClassifier to categorize more than 300 thousandWeb tables into our taxonomy and found that 82.25% are not formatted similarly to relational structure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call