Optimizing Interactive Development of Data-Intensive Applications.

Matteo Interlandi,Miryung Kim,Muhammad Ali Gulzar,Sai Deep Tetali,Joseph Noor,Todd Millstein,Tyson Condie

doi:10.1145/2987550.2987565

Abstract

Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. Vega is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage Vega to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimizing Interactive Development of Data-Intensive Applications.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)

Lead the way for us

Journal: Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)	Publication Date: Oct 5, 2016
Citations: 12

Similar Papers

Interactive and automated debugging for big data analytics
Muhammad Ali Gulzar
-
Muhammad Ali GulzarMuhammad Ali Gulzar
27 May 2018
27 May 2018

DAG-Based Formal Modeling of Spark Applications with MSVL
Kaixuan Fan ... Meng Wang
Information | VOL. 14
Kaixuan Fan, et. al.Kaixuan Fan ... Meng Wang
12 Dec 2023
Information | VOL. 14

Big data and the web: algorithms for data intensive scalable computig

-

01 Jan 2012
01 Jan 2012

Efficient Fuzz Testing for Apache Spark Using Framework Abstraction
Qian Zhang ... Rohan Padhye
-
Qian Zhang, et. al.Qian Zhang ... Rohan Padhye
01 May 2021
01 May 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing Interactive Development of Data-Intensive Applications.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)