BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.

Miryung Kim,Todd Millstein,Sai Deep Tetali,Muhammad Ali Gulzar,Matteo Interlandi,Tyson Condie,Seunghyun Yoo

doi:10.1145/2884781.2884813

Abstract

Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today's data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG's simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.

Abstract

Talk to us

Similar Papers

More From: Proceedings - International Conference on Software Engineering. International Conference on Software Engineering

Lead the way for us

Journal: Proceedings - International Conference on Software Engineering. International Conference on Software Engineering	Publication Date: May 14, 2016
Citations: 91