Aggregate-based Training Phase for ML-based Cardinality Estimation

Lucas Woltmann,Claudio Hartmann,Wolfgang Lehner,Dirk Habich

doi:10.1007/s13222-021-00400-z

Abstract

Cardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches may deliver more accurate cardinality estimations than traditional approaches. However, a lot of training queries have to be executed during the model training phase to learn a data-dependent ML model making it very time-consuming. Many of those training or example queries use the same base data, have the same query structure, and only differ in their selective predicates. To speed up the model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-based training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 90 with our aggregate-based training phase and thus outperform indexes.

Highlights

Due to skew and correlation in data managed by database systems (DBMS), query optimization is still an important challenge
We propose a novel training phase based on pre-aggregated data for machine learning (ML)-based cardinality estimation approaches
We made the case for cardinality estimation as a candidate for database support of machine learning for DBMS

Summary

Introduction

Due to skew and correlation in data managed by database systems (DBMS), query optimization is still an important challenge. As shown in recent papers [9, 10], including our own work [11], machine learning-based cardinality estimation approaches are able to meet higher accuracy requirements, especially for highly correlated data. To overcome these shortcomings, we propose a novel training phase based on pre-aggregated data for ML-based cardinality estimation approaches. We propose a novel training phase based on pre-aggregated data for ML-based cardinality estimation approaches This is an extended version of previous work [12]. Based on this discussion, we introduce our general solution approach of an aggregated-based training phase by pre-aggregating the base data using the data cube concept and executing the example queries over this preaggregated data. 4. we present experimental evaluation results for four different workloads for the training phase of ML-based cardinality estimation in Sect.

Machine Learning Models for DBMS

Machine Learning Support for DBMS

Case Study

Global Model Approach

Local Model Approach

Training Phase Workload Analysis

Training on Pre-Aggregated Data

Grouping Sets as Pre-aggregates

Benefit Criterion

Implementation

Analyzer Component

Experimental Setting

Evaluation

Experimental Results

Main Findings

Related Work

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Datenbank-Spektrum	Publication Date: Jan 10, 2022
Citations: 2	License type: open-access

R Discovery Prime

R Discovery Prime

Aggregate-based Training Phase for ML-based Cardinality Estimation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Datenbank-Spektrum

Lead the way for us

Similar Papers

PostCENN
Dirk Habich ... Claudio Hartmann
Proceedings of the VLDB Endowment | VOL. 14
Dirk Habich, et. al.Dirk Habich ... Claudio Hartmann
01 Jul 2021
Proceedings of the VLDB Endowment | VOL. 14

Best of both worlds
Wolfgang Lehner ... Lucas Woltmann
-
Wolfgang Lehner, et. al.Wolfgang Lehner ... Lucas Woltmann
14 Jun 2020
14 Jun 2020

Cardinality estimation with local deep learning models
Maik Thiele ... Dirk Habich
-
Maik Thiele, et. al.Maik Thiele ... Dirk Habich
05 Jul 2019
05 Jul 2019

Machine Learning Based Method for Impedance Estimation and Unbalance Supply Voltage Detection in Induction Motors.
Acácio M R Amaral ... Antonio J Marques Cardoso
Sensors (Basel, Switzerland) | VOL. 23
Acácio M R Amaral, et. al.Acácio M R Amaral ... Antonio J Marques Cardoso
20 Sep 2023
Sensors (Basel, Switzerland) | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Aggregate-based Training Phase for ML-based Cardinality Estimation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Datenbank-Spektrum