Automated Debugging in Data-Intensive Scalable Computing.

Muhammad Ali Gulzar,Mingda Li,Xueyuan Han,Miryung Kim,Tyson Condie,Matteo Interlandi

doi:10.1145/3127479.3131624

Muhammad Ali Gulzar, Mingda Li + Show 4 more

Open Access

PDF Available

https://doi.org/10.1145/3127479.3131624

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Developing Big Data Analytics workloads often involves trial and error debugging, due to the unclean nature of datasets or wrong assumptions made about data. When errors (e.g., program crash, outlier results, etc.) arise, developers are often interested in identifying a subset of the input data that is able to reproduce the problem. BigSift is a new faulty data localization approach that combines insights from automated fault isolation in software engineering and data provenance in database systems to find a minimum set of failure-inducing inputs. BigSift redefines data provenance for the purpose of debugging using a test oracle function and implements several unique optimizations, specifically geared towards the iterative nature of automated debugging workloads. BigSift improves the accuracy of fault localizability by several orders-of-magnitude (~ 103 to 107×) compared to Titian data provenance, and improves performance by up to 66× compared to Delta Debugging, an automated fault-isolation technique. For each faulty output, BigSift is able to localize fault-inducing data within 62% of the original job running time.

Full Text

Accepted Version (Free)

View/Download pdf

Published Version

Check institute access

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Automated Debugging in Data-Intensive Scalable Computing.

Abstract

Accepted Version (Free)

Published Version

Talk to us

Similar Papers

More From: Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)

Lead the way for us

Journal: Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)	Publication Date: Sep 24, 2017
Citations: 20

Similar Papers

BigSift: automated debugging of big data analytics in data-intensive scalable computing
Muhammad Ali Gulzar ... Miryung Kim
-
Muhammad Ali Gulzar, et. al.Muhammad Ali Gulzar ... Miryung Kim
26 Oct 2018
26 Oct 2018

The Implementation of Titian for Data Provenance on DISC Systems Automated Debugging
Agista Putri ... Nungki Selviandro
International Journal on Information and Communication Technology (IJoICT) | VOL. 10
Agista Putri, et. al.Agista Putri ... Nungki Selviandro
03 Jul 2024
International Journal on Information and Communication Technology (IJoICT) | VOL. 10

SEIZE User Desired Moments: Runtime Inspection for Parallel Dataflow Systems
Youfu Li ... Carlo Zaniolo
-
Youfu Li, et. al.Youfu Li ... Carlo Zaniolo
01 Nov 2020
01 Nov 2020

SEIZE: Runtime Inspection for Parallel Dataflow Systems
Youfu Li ... Wei Wang
IEEE Transactions on Parallel and Distributed Systems | VOL. 32
Youfu Li, et. al.Youfu Li ... Wei Wang
02 Nov 2020
IEEE Transactions on Parallel and Distributed Systems | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Automated Debugging in Data-Intensive Scalable Computing.

Abstract

Accepted Version (Free)

Published Version

Talk to us

Similar Papers

More From: Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)