Computing query probability with incidence algebras

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

We describe an algorithm that evaluates queries over probabilistic databases using Mobius' inversion formula in incidence algebras. The queries we consider are unions of conjunctive queries (equivalently: existential, positive First Order sentences), and the probabilistic databases are tuple-independent structures. Our algorithm runs in PTIME on a subset of queries called queries, and is complete, in the sense that every unsafe query is hard for the class FP#P. The algorithm is very simple and easy to implement in practice, yet it is non-obvious. Mobius' inversion formula, which is in essence inclusion-exclusion, plays a key role for completeness, by allowing the algorithm to compute the probability of some safe queries even when they have some subqueries that are unsafe. We also apply the same lattice-theoretic techniques to analyze an algorithm based on lifted conditioning, and prove that it is incomplete.

Similar Papers
  • Conference Article
  • Cite Count Icon 28
  • 10.1145/1938551.1938574
Knowledge compilation meets database theory
  • Mar 21, 2011
  • Abhay Jha + 1 more

The goal of Knowledge Compilation is to represent a Boolean expression in a format in which it can answer a range of online-queries in PTIME. The online-query of main interest to us is model counting, because of its application to query evaluation on probabilistic databases, but other online-queries can be supported as well such as testing for equivalence, testing for implication, etc. In this paper we study the following problem. Given a database query q, decide whether its lineage can be compiled efficiently into a given target language. We consider four target languages, of strictly increasing expressive power(when the size of compilation is constrained to be polynomial in the input size): Read-Once Boolean formulae, OBDD, FBDD and d-DNNF. For each target, we study the class of database queries that admit polynomial size representation: these queries can also be evaluated in PTIME over probabilistic databases. When queries are restricted to conjunctive queries without self-joins, it was known that these four classes collapse to the class of hierarchical queries, which is also the class of PTIME queries over probabilistic databases. Our main result in this paper is that, in the case of Unions of Conjunctive Queries (UCQ), these classes form a strict hierarchy. Thus, unlike conjunctive queries without self-joins, the expressive power of UCQ differs considerably w.r.t. these target compilation languages. Moreover, we give a complete characterization of the first two target languages, based on the query's syntax.

  • Research Article
  • Cite Count Icon 58
  • 10.1007/s00224-012-9392-5
Knowledge Compilation Meets Database Theory: Compiling Queries to Decision Diagrams
  • Mar 6, 2012
  • Theory of Computing Systems
  • Abhay Jha + 1 more

The goal of Knowledge Compilation is to represent a Boolean expression in a format in which it can answer a range of “online-queries” in PTIME. The online-query of main interest to us is model counting, because of its application to query evaluation on probabilistic databases, but other online-queries can be supported as well such as testing for equivalence, testing for implication, etc. In this paper we study the following problem: given a database query q, decide whether its lineage can be compiled efficiently into a given target language. We consider four target languages, of strictly increasing expressive power (when the size of compilation is restricted to be polynomial in the data size): read-once Boolean formulae, OBDD, FBDD and d-DNNF. For each target, we study the class of database queries that admit polynomial size representation: these queries can also be evaluated in PTIME over probabilistic databases. When queries are restricted to conjunctive queries without self-joins, it was known that these four classes collapse to the class of hierarchical queries, which is also the class of PTIME queries over probabilistic databases. Our main result in this paper is that, in the case of Unions of Conjunctive Queries (UCQ), these classes form a strict hierarchy. Thus, unlike conjunctive queries without self-joins, the expressive power of UCQ differs considerably with respect to these target compilation languages. Moreover, we give a complete characterization of the first two target languages, based on the query’s syntax.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.46298/lmcs-18(1:2)2022
The Dichotomy of Evaluating Homomorphism-Closed Queries on Probabilistic Graphs
  • Jan 7, 2022
  • Logical Methods in Computer Science
  • Antoine Amarilli + 1 more

We study the problem of query evaluation on probabilistic graphs, namely, tuple-independent probabilistic databases over signatures of arity two. We focus on the class of queries closed under homomorphisms, or, equivalently, the infinite unions of conjunctive queries. Our main result states that the probabilistic query evaluation problem is #P-hard for all unbounded queries from this class. As bounded queries from this class are equivalent to a union of conjunctive queries, they are already classified by the dichotomy of Dalvi and Suciu (2012). Hence, our result and theirs imply a complete data complexity dichotomy, between polynomial time and #P-hardness, on evaluating homomorphism-closed queries over probabilistic graphs. This dichotomy covers in particular all fragments of infinite unions of conjunctive queries over arity-two signatures, such as negation-free (disjunctive) Datalog, regular path queries, and a large class of ontology-mediated queries. The dichotomy also applies to a restricted case of probabilistic query evaluation called generalized model counting, where fact probabilities must be 0, 0.5, or 1. We show the main result by reducing from the problem of counting the valuations of positive partitioned 2-DNF formulae, or from the source-to-target reliability problem in an undirected graph, depending on properties of minimal models for the query.

  • Research Article
  • 10.1145/3771726
Counting Answers to Unions of Conjunctive Queries: Natural Tractability Criteria and Meta-Complexity
  • Oct 14, 2025
  • ACM Transactions on Computational Logic
  • Jacob Focke + 3 more

We study the problem of counting answers to unions of conjunctive queries (UCQs) under structural restrictions on the input query. Concretely, given a class \(C\) of UCQs, the problem \(\#\text{UCQ}(C)\) provides as input a UCQ \(\Psi\in C\) and a database \(\mathcal{D}\) and the problem is to compute the number of answers of \(\Psi\) in \(\mathcal{D}\) . Chen and Mengel [PODS’16] have shown that for any recursively enumerable class \(C\) , the problem \(\#\text{UCQ}(C)\) is either fixed-parameter tractable or hard for one of the parameterised complexity classes \(\mathrm{W}[1]\) or \(\#\mathrm{W}[1]\) . However, their tractability criterion is unwieldy in the sense that, given any concrete class \(C\) of UCQs, it is not easy to determine how hard it is to count answers to queries in \(C\) . Moreover, given a single specific UCQ \(\Psi\) , it is not easy to determine how hard it is to count answers to \(\Psi\) . In this work, we address the question of finding a natural tractability criterion: The combined conjunctive query of a UCQ \(\Psi=\varphi_{1}\vee\dots\vee\varphi_{\ell}\) is the conjunctive query \(\boldsymbol{\wedge}\left(\Psi\right)=\varphi_{1}\wedge\dots\wedge\varphi_{\ell}\) . We show that under natural closure properties of \(C\) , the problem \(\#\text{UCQ}(C)\) is fixed-parameter tractable if and only if the combined conjunctive queries of UCQs in \(C\) , and their contracts, have bounded treewidth. A contract of a conjunctive query is an augmented structure, taking into account how the quantified variables are connected to the free variables — if all variables are free, then a conjunctive query is equal to its contract; in this special case the criterion for fixed-parameter tractability of \(\#\text{UCQ}(C)\) thus simplifies to the combined queries having bounded treewidth. Finally, we give evidence that a closure property on \(C\) is necessary for obtaining a natural tractability criterion: We show that even for a single UCQ \(\Psi\) , the meta problem of deciding whether \(\#\text{UCQ}(\{\Psi\})\) can be solved in time \(O(|\mathcal{D}|^{d})\) is \(\mathrm{NP}\) -hard for any fixed \(d\geq 1\) . Moreover, we prove that a known exponential-time algorithm for solving the meta problem is optimal under assumptions from fine-grained complexity theory. As a corollary of our reduction, we also establish that approximating the Weisfeiler-Leman-Dimension of a UCQ is \(\mathrm{NP}\) -hard.

  • Conference Article
  • Cite Count Icon 82
  • 10.1145/1142351.1142404
On the decidability and finite controllability of query processing in databases with incomplete information
  • Jun 26, 2006
  • Riccardo Rosati

In this paper we study queries over relational databases with integrity constraints (ICs). The main problem we analyze is OWA query answering, i.e., query answering over a database with ICs under open-world assumption. The kinds of ICs that we consider are functional dependencies (in particular key dependencies) and inclusion dependencies; the query languages we consider are conjunctive queries (CQs), union of conjunctive queries (UCQs), CQs and UCQs with negation and/or inequality. We present a set of results about the decidability and finite controllability of OWA query answering under ICs. In particular: (i) we identify the decidability/undecidability frontier for OWA query answering under different combinations of the ICs allowed and the query language allowed; (ii) we study OWA query answering both over finite databases and over unrestricted databases, and identify the cases in which such a problem is finitely controllable, i.e., when OWA query answering over finite databases coincides with OWA query answering over unrestricted databases. Moreover, we are able to easily turn the above results into new results about implication of ICs and query containment under ICs, due to the deep relationship between OWA query answering and these two classical problems in database theory. In particular, we close two long-standing open problems in query containment, since we prove finite controllability of containment of conjunctive queries both under arbitrary inclusion dependencies and under key and foreign key dependencies. Besides their theoretical interest, we believe that the results of our investigation are very relevant in many research areas which have recently dealt with databases under an incomplete information assumption: e.g., view-based information access, ontology-based information systems, data integration, data exchange, and peer-to-peer information systems.

  • Conference Article
  • Cite Count Icon 8
  • 10.1145/3375395.3387642
Solving a Special Case of the Intensional vs Extensional Conjecture in Probabilistic Databases
  • Jun 14, 2020
  • Mikaël Monet

We consider the problem of exact probabilistic inference for Union of Conjunctive Queries (UCQs) on tuple-independent databases. For this problem, two approaches currently coexist. In the extensional method, query evaluation is performed by exploiting the structure of the query, and relies heavily on the use of the inclusion-exclusion principle. In the intensional method, one first builds a representation of the lineage of the query in a tractable formalism of knowledge compilation. The chosen formalism should then ensure that the probability can be efficiently computed using simple disjointness and independence assumptions, without the need of performing inclusion-exclusion. The extensional approach has long been thought to be strictly more powerful than the intensional approach, the reason being that for some queries, the use of inclusion-exclusion seemed unavoidable. In this paper we introduce a new technique to construct lineage representations as deterministic decomposable circuits in polynomial time. We prove that this technique applies to a class of UCQs that had been conjectured to separate the complexity of the two approaches. In essence, we show that relying on the inclusion-exclusion formula can be avoided by using negation. This result brings back hope to prove that the intensional approach can handle all tractable UCQs.

  • Book Chapter
  • Cite Count Icon 88
  • 10.1007/11965893_12
The Limits of Querying Ontologies
  • Jan 1, 2006
  • Riccardo Rosati

We study query answering in Description Logics (DLs). In particular, we consider conjunctive queries, unions of conjunctive queries, and their extensions with safe negation or inequality, which correspond to well-known classes of relational algebra queries. We provide a set of decidability, undecidability and complexity results for answering queries of the above languages over various classes of Description Logics knowledge bases. In general, such results show that extending standard reasoning tasks in DLs to answering relational queries is unfeasible in many DLs, even in inexpressive ones. In particular: (i) answering even simple conjunctive queries is undecidable in some very expressive DLs in which standard DL reasoning is decidable; (ii) in DLs where answering (unions of) conjunctive queries is decidable, adding the possibility of expressing safe negation or inequality leads in general to undecidability of query answering, even in DLs of very limited expressiveness. We also highlight the negative consequences of these results for the integration of ontologies and rules. We believe that these results have important implications for ontology-based information access, in particular for the design of query languages for ontologies.

  • Conference Article
  • Cite Count Icon 12
  • 10.4230/lipics.icalp.2019.104
Boundedness of Conjunctive Regular Path Queries
  • Apr 25, 2019
  • DROPS (Schloss Dagstuhl – Leibniz Center for Informatics)
  • Pablo Barceló + 2 more

We study the boundedness problem for unions of conjunctive regular path queries with inverses (UC2RPQs). This is the problem of, given a UC2RPQ, checking whether it is equivalent to a union of conjunctive queries (UCQ). We show the problem to be ExpSpace-complete, thus coinciding with the complexity of containment for UC2RPQs. As a corollary, when a UC2RPQ is bounded, it is equivalent to a UCQ of at most triple-exponential size, and in fact we show that this bound is optimal. We also study better behaved classes of UC2RPQs, namely acyclic UC2RPQs of bounded thickness, and strongly connected UCRPQs, whose boundedness problem is, respectively, PSpace-complete and Pi_2^P-complete. Most upper bounds exploit results on limitedness for distance automata, in particular extending the model with alternation and two-wayness, which may be of independent interest.

  • Research Article
  • 10.1145/3651614
Counting Answers to Unions of Conjunctive Queries: Natural Tractability Criteria and Meta-Complexity
  • May 10, 2024
  • Proceedings of the ACM on Management of Data
  • Jacob Focke + 3 more

We study the problem of counting answers to unions of conjunctive queries (UCQs) under structural restrictions on the input query. Concretely, given a class C of UCQs, the problem #UCQ (C) provides as input a UCQ Ψ ∈ C and a database D and the problem is to compute the number of answers of Ψ in D. Chen and Mengel [PODS'16] have shown that for any recursively enumerable class C, the problem #UCQ (C) is either fixed-parameter tractable or hard for one of the parameterised complexity classes W[1] or #W[1]. However, their tractability criterion is unwieldy in the sense that, given any concrete class C of UCQs, it is not easy to determine how hard it is to count answers to queries in C. Moreover, given a single specific UCQ Ψ, it is not easy to determine how hard it is to count answers to Ψ. In this work, we address the question of finding a natural tractability criterion: The combined conjunctive query of a UCQ Ψ=φ 1 ∨ ... ∨ φ l is the conjunctive query ^ Ψ = φ_1 ∧ ... ∧ φ l . We show that under natural closure properties of C, the problem #UCQ (C) is fixed-parameter tractable if and only if the combined conjunctive queries of UCQs in C, and their contracts, have bounded treewidth. A contract of a conjunctive query is an augmented structure, taking into account how the quantified variables are connected to the free variables --- if all variables are free, then a conjunctive query is equal to its contract; in this special case the criterion for fixed-parameter tractability of #UCQ (C) thus simplifies to the combined queries having bounded treewidth. Finally, we give evidence that a closure property on C is necessary for obtaining a natural tractability criterion: We show that even for a single UCQ Ψ, the meta problem of deciding whether #UCQ (Ψ) can be solved in time O(|D| d ) is NP-hard for any fixed d ≥ 1. Moreover, we prove that a known exponential-time algorithm for solving the meta problem is optimal under assumptions from fine-grained complexity theory. As a corollary of our reduction, we also establish that approximating the Weisfeiler-Leman-Dimension of a UCQ is NP-hard.

  • Research Article
  • 10.46298/lmcs-21(1:29)2025
Unbalanced Triangle Detection and Enumeration Hardness for Unions of Conjunctive Queries
  • Mar 27, 2025
  • Logical Methods in Computer Science
  • Karl Bringmann + 1 more

We study the enumeration of answers to Unions of Conjunctive Queries (UCQs) with optimal time guarantees. More precisely, we wish to identify the queries that can be solved with linear preprocessing time and constant delay. Despite the basic nature of this problem, it was shown only recently that UCQs can be solved within these time bounds if they admit free-connex union extensions, even if all individual CQs in the union are intractable with respect to the same complexity measure. Our goal is to understand whether there exist additional tractable UCQs, not covered by the currently known algorithms. As a first step, we show that some previously unclassified UCQs are hard using the classic 3SUM hypothesis, via a known reduction from 3SUM to triangle listing in graphs. As a second step, we identify a question about a variant of this graph task that is unavoidable if we want to classify all self-join-free UCQs: is it possible to decide the existence of a triangle in a vertex-unbalanced tripartite graph in linear time? We prove that this task is equivalent in hardness to some family of UCQs. Finally, we show a dichotomy for unions of two self-join-free CQs if we assume the answer to this question is negative. In conclusion, this paper pinpoints a computational barrier in the form of a single decision problem that is key to advancing our understanding of the enumeration complexity of many UCQs. Without a breakthrough for unbalanced triangle detection, we have no hope of finding an efficient algorithm for additional unions of two self-join-free CQs. On the other hand, a sufficiently efficient unbalanced triangle detection algorithm can be turned into an efficient algorithm for a family of UCQs currently not known to be tractable.

  • Conference Article
  • Cite Count Icon 33
  • 10.4230/lipics.icdt.2018.8
Answering UCQs under Updates and in the Presence of Integrity Constraints
  • Jan 1, 2018
  • arXiv (Cornell University)
  • Christoph Berkholz + 2 more

We investigate the query evaluation problem for fixed queries over fully dynamic databases where tuples can be inserted or deleted. The task is to design a dynamic data structure that can immediately report the new result of a fixed query after every database update. We consider unions of conjunctive queries (UCQs) and focus on the query evaluation tasks testing (decide whether an input tuple belongs to the query result), enumeration (enumerate, without repetition, all tuples in the query result), and counting (output the number of tuples in the query result). We identify three increasingly restrictive classes of UCQs which we call t-hierarchical, q-hierarchical, and exhaustively q-hierarchical UCQs. Our main results provide the following dichotomies: If the query's homomorphic core is t-hierarchical (q-hierarchical, exhaustively q-hierarchical), then the testing (enumeration, counting) problem can be solved with constant update time and constant testing time (delay, counting time). Otherwise, it cannot be solved with sublinear update time and sublinear testing time (delay, counting time), unless the OV-conjecture and/or the OMv-conjecture fails. We also study the complexity of query evaluation in the dynamic setting in the presence of integrity constraints, and we obtain similar dichotomy results for the special case of small domain constraints (i.e., constraints which state that all values in a particular column of a relation belong to a fixed domain of constant size).

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.artint.2021.103474
Open-world probabilistic databases: Semantics, algorithms, complexity
  • Feb 15, 2021
  • Artificial Intelligence
  • İsmail İlkan Ceylan + 2 more

Open-world probabilistic databases: Semantics, algorithms, complexity

  • Conference Article
  • Cite Count Icon 50
  • 10.1145/3196959.3196967
Joining Extractions of Regular Expressions
  • May 27, 2018
  • Dominik D Freydenberger + 2 more

Regular expressions with capture variables, also known as "regex formulas,'' extract relations of spans (interval positions) from text. These relations can be further manipulated via the relational Algebra as studied in the context of "document spanners," Fagin et al.'s formal framework for information extraction. We investigate the complexity of querying text by Conjunctive Queries (CQs) and Unions of CQs (UCQs) on top of regex formulas. Such queries have been investigated in prior work on document spanners, but little is known about the (combined) complexity of their evaluation. We show that the lower bounds (NP-completeness and W[1]-hardness) from the relational world also hold in our setting; in particular, hardness hits already single-character text. Yet, the upper bounds from the relational world do not carry over. Unlike the relational world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source of hardness is that it may be intractable to instantiate the relation defined by a regex formula, simply because it has an exponential number of tuples. Yet, we are able to establish general upper bounds. In particular, UCQs can be evaluated with polynomial delay, provided that every CQ has a bounded number of atoms (while unions and projection can be arbitrary). Furthermore, UCQ evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the parameter is the size of the UCQ.

  • Dissertation
  • 10.5353/th_b4775313
Managing query quality in probabilistic databases
  • Jan 1, 2011
  • Xiang Li

In many emerging applications, such as sensor networks, location-based services,\n\nand data integration, the database is inherently uncertain. To handle a large\n\namount of uncertain data, probabilistic databases have been recently proposed,\n\nwhere probabilistic queries are enabled to provide answers with statistical guarantees.\n\nIn this thesis, we study the important issues of managing the quality of\n\na probabilistic database. We first address the problem of measuring the ambiguity,\n\nor quality, of a probabilistic query. This is accomplished by computing the\n\nPWS-quality score, a recently proposed measure for quantifying the ambiguity of\n\nquery answers under the possible world semantics. We study the computation of\n\nthe PWS-quality for the top-k query. This problem is not trivial, since directly\n\ncomputing the top-k query score is computationally expensive. To tackle this\n\nchallenge, we propose efficient approximate algorithms for deriving the quality\n\nscore of a top-k query. We have performed experiments on both synthetic and\n\nreal data to validate their performance and accuracy.\n\nOur second contribution is to study how to use the PWS-quality score to\n\ncoordinate the process of cleaning uncertain data. Removing ambiguous data\n\nfrom a probabilistic database can often give us a higher-quality query result.\n\nHowever, this operation requires some external knowledge (e.g., an updated value\n\nfrom a sensor source), and is thus not without cost. It is important to choose the\n\ncorrect object to clean, in order to (1) achieve a high quality gain, and (2) incur\n\na low cleaning cost. In this thesis, we examine different cleaning methods for a\n\nprobabilistic top-k query. We also study an interesting problem where different\n\nquery users have their own budgets available for cleaning. We demonstrate how\n\nan optimal solution, in terms of the lowest cleaning costs, can be achieved, for\n\nprobabilistic range and maximum queries. An extensive evaluation reveals that\n\nthese solutions are highly efficient and accurate.

  • Conference Article
  • Cite Count Icon 3
  • 10.1145/2076623.2076634
Scrubbing query results from probabilistic databases
  • Jan 1, 2011
  • Jianwen Chen + 2 more

Queries over probabilistic databases lead to probabilistic results. As the process of arriving at these results is based on underlying data probabilities, we believe involving a user in the loop of query processing and leveraging the user's personal knowledge to deal with uncertain data, will enable the system to scrub (correct) and tailor its probabilistic query results towards a better quality from the perspective of the specific user. In this paper, we propose to open the black box of a probabilistic database query engine, and explain to the user how the engine comes up with the probabilistic query result as well as which uncertain tuples in the database the result is derived from. In this way, the user based on his/her knowledge about uncertain information can not only decide how much confidence to be placed on the query engine, but also help clarify some uncertain information so that the query engine can re-generate an improved query result. Two particular issues associated with such a probabilistic database query framework are addressed: (i) how to interact with a user for answer explanation and uncertainty clarification without bringing much burden to the user, and (ii) how to scrub/correct the query result without incurring much computation overhead to the query engine. Our performance study demonstrates the accuracy effectiveness and computational efficiency achieved by the proposed framework.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant