Querying probabilistic information extraction

Daisy Zhe Wang,Michael J Franklin,Minos Garofalakis,Joseph M Hellerstein

doi:10.14778/1920841.1920974

Abstract

Recently, there has been increasing interest in extending relational query processing to include data obtained from unstructured sources. A common approach is to use stand-alone Information Extraction (IE) techniques to identify and label entities within blocks of text; the resulting entities are then imported into a standard database and processed using relational queries. This two-part approach, however, suffers from two main drawbacks. First, IE is inherently probabilistic, but traditional query processing does not properly handle probabilistic data, resulting in reduced answer quality. Second, performance inefficiencies arise due to the separation of IE from query processing. In this paper, we address these two problems by building on an in-database implementation of a leading IE model---Conditional Random Fields using the Viterbi inference algorithm. We develop two different query approaches on top of this implementation. The first uses deterministic queries over maximum-likelihood extractions, with optimizations to push the relational operators into the Viterbi algorithm. The second extends the Viterbi algorithm to produce a set of possible extraction "worlds", from which we compute top-kprobabilistic query answers. We describe these approaches and explore the trade-offs of efficiency and effectiveness between them using two datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Querying probabilistic information extraction

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Sep 1, 2010
Citations: 54

Similar Papers

Probabilistic declarative information extraction
Daisy Zhe Wang ... Michael J Franklin
-
Daisy Zhe Wang, et. al.Daisy Zhe Wang ... Michael J Franklin
01 Jan 2009
01 Jan 2009

Automated PII Extraction from Social Media for Raising Privacy Awareness: A Deep Transfer Learning Approach
Yizhi Liu ... Fang Yu Lin
-
Yizhi Liu, et. al.Yizhi Liu ... Fang Yu Lin
02 Nov 2021
02 Nov 2021

Assessment of Information Extraction Techniques, Models and Systems
Atta-Ur Rahman ... Dakheel Almoqbil
Mathematical Modelling of Engineering Problems | VOL. 9
Atta-Ur Rahman, et. al.Atta-Ur Rahman ... Dakheel Almoqbil
30 Jun 2022
Mathematical Modelling of Engineering Problems | VOL. 9

Beyond linear chain
Diego Marcheggiani
ACM SIGIR Forum | VOL. 48
Diego MarcheggianiDiego Marcheggiani
26 Jun 2014
ACM SIGIR Forum | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Querying probabilistic information extraction

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment