A learned cost model for big data query processing

Yan Li,Zhiyong Peng,Bolong Zheng,Liwei Wang,Yuan Sun,Sheng Wang

doi:10.1016/j.ins.2024.120650

Abstract

The efficiency of query processing in the Spark SQL big data processing engine is significantly affected by execution plans and allocated resources. However, existing cost models for Spark SQL rely on hand-crafted rules. While learning-based cost models have been proposed for relational databases, they do not consider available resources. To address this issue, we propose a resource-aware deep learning model capable of automatically predicting query plan execution times based on historical data. To train our model, we embed query execution plans within a query plan tree and extracted features from allocated resources. An adaptive attention mechanism is integrated into the deep learning model to enhance prediction accuracy. Additionally, we extract sufficient features to represent data information and learn the effect of the data on query execution. This approach reduces the need for model retraining owing to data changes. The experimental results demonstrate that our deep cost model outperforms traditional rule-based methods and relational database learning-based optimizers in predicting query plan execution times.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A learned cost model for big data query processing

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Apr 24, 2024
Citations: 1

Similar Papers

A Resource-Aware Deep Cost Model for Big Data Query Processing
Yan Li ... Zhiyong Peng
-
Yan Li, et. al.Yan Li ... Zhiyong Peng
01 May 2022
01 May 2022

Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings
Tarique Siddiqui ... Wangchao Le
-
Tarique Siddiqui, et. al.Tarique Siddiqui ... Wangchao Le
31 May 2020
31 May 2020

Budget-aware Query Tuning: An AutoML Perspective
Wentao Wu ... Chi Wang
ACM SIGMOD Record | VOL. 53
Wentao Wu, et. al.Wentao Wu ... Chi Wang
08 Nov 2024
ACM SIGMOD Record | VOL. 53

ACORE: A Query Optimization Approach for Spark SQL Based on Cost Model and Markov Prediction Model
Lanxin Su ... Huiyong Liu
Journal of Physics: Conference Series | VOL. 1955
Lanxin Su, et. al.Lanxin Su ... Huiyong Liu
01 Jun 2021
Journal of Physics: Conference Series | VOL. 1955

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A learned cost model for big data query processing

Abstract

Talk to us

Similar Papers

More From: Information Sciences