Finding bugs in database systems via query partitioning

Manuel Rigger,Zhendong Su

doi:10.1145/3428279

Abstract

Logic bugs in Database Management Systems (DBMSs) are bugs that cause an incorrect result for a given query, for example, by omitting a row that should be fetched. These bugs are critical, since they are likely to go unnoticed by users. We propose Query Partitioning, a general and effective approach for finding logic bugs in DBMSs. The core idea of Query Partitioning is to, starting from a given original query, derive multiple, more complex queries (called partitioning queries), each of which computes a partition of the result. The individual partitions are then composed to compute a result set that must be equivalent to the original query's result set. A bug in the DBMS is detected when these result sets differ. Our intuition is that due to the increased complexity, the partitioning queries are more likely to stress the DBMS and trigger a logic bug than the original query. As a concrete instance of a partitioning strategy, we propose Ternary Logic Partitioning (TLP), which is based on the observation that a boolean predicate p can either evaluate to TRUE, FALSE, or NULL. Accordingly, a query can be decomposed into three partitioning queries, each of which computes its result on rows or intermediate results for which p, NOT p, and p IS NULL hold. This technique is versatile, and can be used to test WHERE, GROUP BY, as well as HAVING clauses, aggregate functions, and DISTINCT queries. As part of an extensive testing campaign, we found 175 bugs in widely-used DBMSs such as MySQL, TiDB, SQLite, and CockroachDB, 125 of which have been fixed. Notably, 77 of these were logic bugs, while the remaining were error and crash bugs. We expect that the effectiveness and wide applicability of Query Partitioning will lead to its broad adoption in practice, and the formulation of additional partitioning strategies.

Highlights

Database Management Systems (DBMSs) are used ubiquitously
We found a bug in DuckDB, where a complex query using GROUP BY and HAVING clauses, as well as UNION resulted in a nondeterministic result
This paper has presented the general idea of Query Partitioning, and a concrete instantiation of this idea, termed Ternary Logic Partitioning (TLP)

Summary

Introduction

Database Management Systems (DBMSs) are used ubiquitously. Most DBMSs allow inserting, deleting, modifying, and querying data from a database using the Structured Query Language (SQL). For a given query, a DBMS might mistakenly omit a record from the result set, fetch a record that should not be in the result set, or compute an incorrect result for a function or operator. Such bugs are difficult to detect by users and might go unnoticed, especially considering the scale of many databases. SQL’s data model is based on bags (i.e., multisets), where the same row can occur multiple times [Guagliardo and Libkin 2017] This contrasts the original relational model, which is based on the concept of sets. Both operators are used in the composition operator of different TLP test oracles

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the ACM on programming languages	Publication Date: Nov 13, 2020
Citations: 41	License type: cc-by

R Discovery Prime

R Discovery Prime

Finding bugs in database systems via query partitioning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the ACM on programming languages

Lead the way for us

Similar Papers

Detecting optimization bugs in database engines via non-optimizing reference engine construction
Manuel Rigger ... Zhendong Su
-
Manuel Rigger, et. al.Manuel Rigger ... Zhendong Su
08 Nov 2020
08 Nov 2020

A Demonstration of DLBD: Database Logic Bug Detection System
Xiu Tang ... Gongsheng Yuan
Proceedings of the VLDB Endowment | VOL. 16
Xiu Tang, et. al.Xiu Tang ... Gongsheng Yuan
01 Aug 2023
Proceedings of the VLDB Endowment | VOL. 16

Parallel Database Techniques

Scalable Computing Practice and Experience | VOL. 4

03 Jan 2001
Scalable Computing Practice and Experience | VOL. 4

ReDRIVE
Marina Drosou ... Evaggelia Pitoura
-
Marina Drosou, et. al.Marina Drosou ... Evaggelia Pitoura
24 Oct 2011
24 Oct 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Finding bugs in database systems via query partitioning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the ACM on programming languages