InfoGather

Mohamed Yakout,Surajit Chaudhuri,Kaushik Chakrabarti,Kris Ganjam

doi:10.1145/2213836.2213848

Abstract

The Web contains a vast corpus of HTML tables, specifically entity attribute tables. We present three core operations, namely entity augmentation by attribute name, entity augmentation by example and attribute discovery, that are useful for information gathering tasks (e.g., researching for products or stocks). We propose to use web table corpus to perform them automatically. We require the operations to have high precision and coverage, have fast (ideally interactive) response times and be applicable to any arbitrary domain of entities. The naive approach that attempts to directly match the user input with the web tables suffers from poor precision and coverage.Our key insight is that we can achieve much higher precision and coverage by considering indirectly matching tables in addition to the directly matching ones. The challenge is to be robust to spuriously matched tables: we address it by developing a holistic matching framework based on topic sensitive pagerank and an augmentation framework that aggregates predictions from multiple matched tables. We propose a novel architecture that leverages preprocessing in MapReduce to achieve extremely fast response times at query time. Our experiments on real-life datasets and 573M web tables show that our approach has (i) significantly higher precision and coverage and (ii) four orders of magnitude faster response times compared with the state-of-the-art approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

InfoGather

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Web Table Understanding by Collective Inference
San Kim ... Jianhua Feng
-
San Kim, et. al.San Kim ... Jianhua Feng
17 Oct 2018
17 Oct 2018

Evaluation of Wayfinding Performance and Workload on Electronic Map Interface
Ya-Li Lin ... Cheng-Han Wang
-
Ya-Li Lin, et. al.Ya-Li Lin ... Cheng-Han Wang
01 Jan 2010
01 Jan 2010

Stitching web tables for improving matching quality
Oliver Lehmberg ... Christian Bizer
Proceedings of the VLDB Endowment | VOL. 10
Oliver Lehmberg, et. al.Oliver Lehmberg ... Christian Bizer
01 Aug 2017
Proceedings of the VLDB Endowment | VOL. 10

Web table column categorisation and profiling
Oliver Lehmberg ... Christian Bizer
-
Oliver Lehmberg, et. al.Oliver Lehmberg ... Christian Bizer
26 Jun 2016
26 Jun 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

InfoGather

Abstract

Talk to us

Similar Papers