Abstract

In a smart grid, massive amounts of data are generated during the production, transmission, and consumption of electricity. Often, complex and varied queries with multiple join and selection operations need to be run on such data. Several studies have focused on improving the performance of query evaluation by applying machine learning techniques to query optimization problems. However, these studies are limited to processing queries for data in a single environment. In this paper, we propose a Proximal Policy Optimization (PPO)-based join order optimization model for use on Spark SQL to improve the retrieval performance for large amounts of data. The model uses the cost computation method of Spark SQL for training with the costs of the join plans generated by the model as rewards. The model can find more join plans with lower costs than the plans that Spark SQL finds because Spark SQL is limited to a low search space. We demonstrate that the proposed model generates join plans with similar or lower costs than Spark SQL without executing the optimization algorithm of Spark SQL.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.