Automated Translation of Functional Big Data Queries to SQL

Guoqiang Zhang,Benjamin Mariano,Işıl Dillig,Xipeng Shen

doi:10.1145/3586047

Abstract

Big data analytics frameworks like Apache Spark and Flink enable users to implement queries over large, distributed databases using functional APIs. In recent years, these APIs have grown in popularity because their functional interfaces abstract away much of the minutiae of distributed programming required by traditional query languages like SQL. However, the convenience of these APIs comes at a cost because functional queries are often less efficient than their SQL counterparts. Motivated by this observation, we present a new technique for automatically transpiling functional queries to SQL. While our approach is based on the standard paradigm of counterexample-guided inductive synthesis, it uses a novel column-wise decomposition technique to split the synthesis task into smaller subquery synthesis problems. We have implemented this approach as a new tool called RDD2SQL for translating Spark RDD queries to SQL and empirically evaluate the effectiveness of RDD2SQL on a set of real-world RDD queries. Our results show that (1) most RDD queries can be translated to SQL, (2) our tool is very effective at automating this translation, and (3) performing this translation offers significant performance benefits.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automated Translation of Functional Big Data Queries to SQL

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Programming Languages

Lead the way for us

Journal: Proceedings of the ACM on Programming Languages	Publication Date: Apr 6, 2023
Citations: 1

Similar Papers

Big data analytics on Apache Spark
Salman Salloum ... Joshua Zhexue Huang
International Journal of Data Science and Analytics | VOL. 1
Salman Salloum, et. al.Salman Salloum ... Joshua Zhexue Huang
13 Oct 2016
International Journal of Data Science and Analytics | VOL. 1

A Theoretical Framework for Big Data Analytics Based on Computational Intelligent Algorithms with the Potential to Reduce Energy Consumption
Haruna Chiroma ... Usman Ali Abdullahi
-
Haruna Chiroma, et. al.Haruna Chiroma ... Usman Ali Abdullahi
01 Jan 2019
01 Jan 2019

Smart Cities and Big Data Analytics: A Data-Driven Decision-Making Use Case
Ahmed M Shahat Osman ... Ahmed Elragal
Smart Cities | VOL. 4
Ahmed M Shahat Osman, et. al.Ahmed M Shahat Osman ... Ahmed Elragal
28 Feb 2021
Smart Cities | VOL. 4

A novel big data analytics framework for smart cities
Ahmed M Shahat Osman
Future Generation Computer Systems | VOL. 91
Ahmed M Shahat OsmanAhmed M Shahat Osman
17 Jul 2018
Future Generation Computer Systems | VOL. 91

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated Translation of Functional Big Data Queries to SQL

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Programming Languages