Abstract

Cardinality estimation is an important step in cost-based database query optimization. The accuracy of the estimates directly affects the ability of an optimizer to identify the most efficient query execution plan correctly. In this paper, we study cardinality estimation of LIKE-queries, i.e., queries that use the LIKE-operator to match a pattern with wildcards against string-valued attributes. While both traditional and machine-learning-based approaches have been proposed to tackle this problem, we argue that they all suffer from drawbacks. Most importantly, many state-of-the-art approaches are not designed for patterns that contain wildcards in-between characters. Based on past research on neural language models, we introduce the LIKE-Pattern Language Model (LPLM) that uses a new language and a novel probability distribution function to capture the semantics of general LIKE-patterns. We also propose a method to generate training data for our model. We demonstrate that our method outperforms state-of-the-art approaches in terms of precision (Q-error), while offering comparable runtime performance and memory requirements.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call