CERES

Colin Lockard,Arash Einolghozati,Xin Luna Dong,Prashant Shiralkar

doi:10.14778/3231751.3231758

Abstract

The web contains countless semi-structured websites, which can be a rich source of information for populating knowledge bases. Existing methods for extracting relations from the DOM trees of semi-structured webpages can achieve high precision and recall only when manual annotations for each website are available. Although there have been efforts to learn extractors from automatically generated labels, these methods are not sufficiently robust to succeed in settings with complex schemas and information-rich websites. In this paper we present a new method for automatic extraction from semi-structured websites based on distant supervision. We automatically generate training labels by aligning an existing knowledge base with a website and leveraging the unique structural characteristics of semi-structured websites. We then train a classifier based on the potentially noisy and incomplete labels to predict new relation instances. Our method can compete with annotation-based techniques in the literature in terms of extraction quality. A large-scale experiment on over 400,000 pages from dozens of multi-lingual long-tail websites harvested 1.25 million facts at a precision of 90%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CERES

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Jun 1, 2018
Citations: 44

Similar Papers

Distant Supervision for Relation Extraction in The Persian Language using Piecewise Convolutional Neural Networks
Mehrdad Nasser ... Behrouz Minaei-Bidgoli
-
Mehrdad Nasser, et. al.Mehrdad Nasser ... Behrouz Minaei-Bidgoli
01 Apr 2019
01 Apr 2019

Distant supervision for fine-grained biomedical relation extraction from Chinese EMRs
Qing Zhao ... Jianqiang Li
-
Qing Zhao, et. al.Qing Zhao ... Jianqiang Li
15 Dec 2022
15 Dec 2022

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages
Colin Lockard ... Hannaneh Hajishirzi
-
Colin Lockard, et. al.Colin Lockard ... Hannaneh Hajishirzi
01 Jan 2020
01 Jan 2020

Learning Named Entity Tagger using Domain-Specific Dictionary
Jingbo Shang ... Liyuan Liu
-
Jingbo Shang, et. al.Jingbo Shang ... Liyuan Liu
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CERES

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment