Bolt-on, Compact, and Rapid Program Slicing for Notebooks

Shreya Shankar,Sarah Chasins,Aditya Parameswaran,Stephen Macke,Andrew Head

doi:10.14778/3565838.3565855

Abstract

Computational notebooks are commonly used for iterative workflows, such as in exploratory data analysis. This process lends itself to the accumulation of old code and hidden state, making it hard for users to reason about the lineage of, e.g., plots depicting insights or trained machine learning models. One way to reason about code used to generate various notebook data artifacts is to compute aprogram slice, but traditional static approaches to slicing can be both inaccurate (failing to contain relevant code for artifacts) and conservative (containing unnecessary code for an artifacts). We present nbslicer, a dynamic slicer optimized for the notebook setting whose instrumentation for resolving dynamic data dependencies is bothbolt-on(and therefore portable) andswitchable(allowing it to be selectively disabled in order to reduce instrumentation overhead). We demonstrate Nbslicer's ability to construct small and accuratebackward slices(i.e., historical cell dependencies) andforward slices(i.e., cells affected by the "rerun" of an earlier cell), thereby improving reproducibility in notebooks and enabling faster reactive re-execution, respectively. Comparing nbslicer with a static slicer on 374 real notebook sessions, we found that nbslicer filters out far more superfluous program statements while maintaining slice correctness, giving slices that are, on average, 66% and 54% smaller for backward and forward slices, respectively.

Full Text