Abstract
The efficiency of query processing in the Spark SQL big data processing engine is significantly affected by execution plans and allocated resources. However, existing cost models for Spark SQL rely on hand-crafted rules. While learning-based cost models have been proposed for relational databases, they do not consider available resources. To address this issue, we propose a resource-aware deep learning model capable of automatically predicting query plan execution times based on historical data. To train our model, we embed query execution plans within a query plan tree and extracted features from allocated resources. An adaptive attention mechanism is integrated into the deep learning model to enhance prediction accuracy. Additionally, we extract sufficient features to represent data information and learn the effect of the data on query execution. This approach reduces the need for model retraining owing to data changes. The experimental results demonstrate that our deep cost model outperforms traditional rule-based methods and relational database learning-based optimizers in predicting query plan execution times.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.