Abstract

A long-running analytic task on big data often leaves a developer in the dark without providing valuable feedback about the status of the execution. In addition, a failed job that needs to restart from scratch can waste earlier computing resources. An effective method to address these issues is to allow the developer to debug the task during its execution, which is unfortunately not supported by existing big data solutions. In this paper we develop a system called Amber that supports responsive debugging during the execution of a workflow task. After starting the execution, the developer can pause the job at will, investigate the states of the cluster, modify the job, and resume the computation. She can also set conditional breakpoints to pause the execution when certain conditions are satisfied. In this way, the developer can gain a much better understanding of the run-time behavior of the execution and more easily identify issues in the job or data. Amber is based on the actor model, a distributed computing paradigm that provides concurrent units of computation using actors. We give a full specification of Amber, and implement it on top of the Orleans system. Our experiments show its high performance and usability of debugging on computing clusters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call