Optimization in the catalyst optimizer of Spark SQL

Meenu Chawla,Vinita Baniwal

doi:10.3906/elk-1707-6

Abstract

Apache Spark is one of the most technically challenged frameworks for cluster computing in which data are processed in a parallel fashion. The cluster consists of unreliable machines. It processes a large amount of data faster compared to the MapReduce framework. For providing the facility of optimized and fast SQL query processing, a new unit is developed in Apache Spark named Spark SQL. It allows users to use relational processing and functional programming in one place. It provides many optimizations by leveraging the benefits of its core. This is called the catalyst optimizer. This optimizer has many rules to optimize queries for efficient execution. In this paper, we discuss a scenario in which the catalyst optimizer is not able to optimize the query competently for a specific case. This is the reason for inefficient memory usage and increases in the time required for the execution of the query by Spark SQL. For dealing with this issue, we propose a solution in this paper by which the query is optimized up to the peak level. This significantly reduces the time and memory consumed by the shuffling process.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimization in the catalyst optimizer of Spark SQL

Abstract

Talk to us

Similar Papers

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES

Lead the way for us

Similar Papers

Spark SQL
Michael Armbrust ... Davies Liu
-
Michael Armbrust, et. al.Michael Armbrust ... Davies Liu
27 May 2015
27 May 2015

Performance Evaluation of Spark SQL for Batch Processing
K Anusha ... K Usha Rani
-
K Anusha, et. al.K Anusha ... K Usha Rani
01 Jan 2020
01 Jan 2020

Indexing for Large Scale Data Querying Based on Spark SQL
Yi Cui ... Daoyuan Wang
-
Yi Cui, et. al.Yi Cui ... Daoyuan Wang
01 Nov 2017
01 Nov 2017

SHC: Distributed Query Processing for Non-Relational Data Store
Weiqing Yang ... Mingjie Tang
-
Weiqing Yang, et. al.Weiqing Yang ... Mingjie Tang
01 Apr 2018
01 Apr 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimization in the catalyst optimizer of Spark SQL

Abstract

Talk to us

Similar Papers

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING &amp; COMPUTER SCIENCES

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES