Abstract

Probabilistic databases (PDBs) model uncertainty in data in a quantitative way. In the established formal framework, probabilistic (relational) databases are finite probability spaces over relational database instances. This finiteness can clash with intuitive query behavior (Ceylan et al., KR 2016), and with application scenarios that are better modeled by continuous probability distributions (Dalvi et al., CACM 2009). We formally introduced infinite PDBs in (Grohe and Lindner, PODS 2019) with a primary focus on countably infinite spaces. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. We argue that finite point processes are an appropriate model from probability theory for dealing with general probabilistic databases. This allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries.

Highlights

  • Probabilistic databases (PDBs) are used to model uncertainty in data

  • In [GL19], we introduced an extended model of PDBs as arbitrary probability spaces over finite database instances

  • We have introduced views as functions mapping database instances to database instances and adopted a semantics based on possible worlds

Read more

Summary

Introduction

Probabilistic databases (PDBs) are used to model uncertainty in data. Such uncertainty can have various reasons like, for example, noisy sensor data, the presence of incomplete or inconsistent information, or information gathered from unreliable sources [Agg[09], SORK11]. In the standard formal framework, probabilistic databases are finite probability spaces whose sample spaces consist of database instances in the usual sense, referred to as “possible worlds”. This framework has various shortcomings due to its inherent closed-world assumption [CDVdB16, CDVdB21], and the restriction to finite domains. Statistical models of uncertain data, say, for example, for temperature measurements as in Example 2.1, usually feature the use of continuous probability distributions in appropriate error models. This (continuous attribute-level uncertainty) is not expressible in the traditional PDB model. In particular with respect to an open-world assumption, we would like

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.