SQL’s Three-Valued Logic and Certain Answers

Leonid Libkin

doi:10.1145/2877206

Abstract

The goal of the article is to bridge the difference between theoretical and practical approaches to answering queries over databases with nulls. Theoretical research has long ago identified the notion of correctness of query answering over incomplete data: one needs to find certain answers, which are true regardless of how incomplete information is interpreted. This serves as the notion of correctness of query answering, but carries a huge complexity tag. In practice, on the other hand, query answering must be very efficient, and to achieve this, SQL uses three-valued logic for evaluating queries on databases with nulls. Due to the complexity mismatch, the two approaches cannot coincide, but perhaps they are related in some way. For instance, does SQL always produce answers we can be certain about? This is not so: SQL’s and certain answers semantics could be totally unrelated. We show, however, that a slight modification of the three-valued semantics for relational calculus queries can provide the required certainty guarantees. The key point of the new scheme is to fully utilize the three-valued semantics, and classify answers not into certain or noncertain, as was done before, but rather into certainly true, certainly false, or unknown. This yields relatively small changes to the evaluation procedure, which we consider at the level of both declarative (relational calculus) and procedural (relational algebra) queries. These new evaluation procedures give us certainty guarantees even for queries returning tuples with null values.

Highlights

We introduce a new notion of certain answers with nulls, which properly accounts for queries returning tuples containing null values
SQL’s query evaluation engine uses three-valued logic when it comes to handling incomplete information: comparisons involving null values have the truth value unknown [7]
We introduce an evaluation procedure that comes with certainty guarantees for all relational calculus queries

Summary

Introduction

SQL’s query evaluation engine uses three-valued logic when it comes to handling incomplete information: comparisons involving null values have the truth value unknown [7]. This results in a number of well known paradoxes. For instance, two relations R and S with a single numerical attribute A, and assume that S contains a single row with a null value in it. Returns nothing despite the condition in the where clause being a tautology. This is because both null 0 evaluate to unknown and so does their disjunction. For the same reason, the query computing R − S: select R.A from R where R.A not in (select S.A from S)

Objectives

Results

Conclusion