Abstract

We investigate the application of the Shapley value to quantifying the contribution of a tuple to a query answer. The Shapley value is a widely known numerical measure in cooperative game theory and in many applications of game theory for assessing the contribution of a player to a coalition game. It has been established already in the 1950s, and is theoretically justified by being the very single wealth-distribution measure that satisfies some natural axioms. While this value has been investigated in several areas, it received little attention in data management. We study this measure in the context of conjunctive and aggregate queries by defining corresponding coalition games. We provide algorithmic and complexity-theoretic results on the computation of Shapley-based contributions to query answers; and for the hard cases we present approximation algorithms.

Highlights

  • The Shapley value is named after Lloyd Shapley who introduced the value in a seminal 1952 article [Sha53]

  • Salimi et al [SBSdB16] proposed the causal effect: assuming endogenous facts are randomly removed independently and uniformly, what is the difference in the expected query answer between assuming the presence and the absence of f ? Interestingly, as we show here, this value is the same as the Banzhaf power index that has been studied in the context of wealth distribution in cooperative games [DS79], and is different from the Shapley value [Rot88, Chapter 5]

  • We investigate the problem of computing the Shapley value w.r.t. a Boolean Conjunctive Queries (CQs) without self-joins

Read more

Summary

Introduction

The Shapley value is named after Lloyd Shapley who introduced the value in a seminal 1952 article [Sha53]. We apply the Shapley value to quantifying the contribution of database facts (tuples) to query results. We study the complexity of computing the Shapley value for Conjunctive Queries (CQs) and aggregate functions over CQs. Our main results are as follows. Our results immediately generalize to non-Boolean CQs and group-by operators, where the goal is to compute the Shapley value of a fact to each tuple in the answer of a query. We have added the full proof of our main result—the dichotomy in the complexity of computing the Shapley value for Boolean CQs (Theorem 4.1), as well as the proofs of our results for aggregate queries over CQs (Theorem 4.8 and Proposition 4.11).

Preliminaries
Shapley Value of Database Facts
Complexity Results
Related Measures
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call