Visualising data science workflows to support third-party notebook comprehension: an empirical study

Dhivyabharathi Ramasamy,Abraham Bernstein,Cristina Sarasua,Alberto Bacchelli

doi:10.1007/s10664-023-10289-9

Dhivyabharathi Ramasamy, Abraham Bernstein + Show 2 more

Open Access

https://doi.org/10.1007/s10664-023-10289-9

Copy DOI

Journal: Empirical Software Engineering	Publication Date: Mar 23, 2023
Citations: 2	License type: open-access

Affiliation: University of Zurich

Abstract

Data science is an exploratory and iterative process that often leads to complex and unstructured code. This code is usually poorly documented and, consequently, hard to understand by a third party. In this paper, we first collect empirical evidence for the non-linearity of data science code from real-world Jupyter notebooks, confirming the need for new approaches that aid in data science code interaction and comprehension. Second, we propose a visualisation method that elucidates implicit workflow information in data science code and assists data scientists in navigating the so-called garden of forking paths in non-linear code. The visualisation also provides information such as the rationale and the identification of the data science pipeline step based on cell annotations. We conducted a user experiment with data scientists to evaluate the proposed method, assessing the influence of (i) different workflow visualisations and (ii) cell annotations on code comprehension. Our results show that visualising the exploration helps the users obtain an overview of the notebook, significantly improving code comprehension. Furthermore, our qualitative analysis provides more insights into the difficulties faced during data science code comprehension.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Visualising data science workflows to support third-party notebook comprehension: an empirical study

Abstract

Talk to us

Similar Papers

More From: Empirical Software Engineering

Lead the way for us

Similar Papers

An empirical evaluation of machine learning techniques to classify code comprehension based on EEG data
Lucian José Gonçales ... Matheus Segalotto
Expert Systems with Applications | VOL. 203
Lucian José Gonçales, et. al.Lucian José Gonçales ... Matheus Segalotto
06 May 2022
Expert Systems with Applications | VOL. 203

Managing and Composing Teams in Data Science: An Empirical Study
Timo Aho ... Sezin Yaman
-
Timo Aho, et. al.Timo Aho ... Sezin Yaman
15 Dec 2021
15 Dec 2021

Challenging racism in the use of health data
Hannah E Knight ... Adam Steventon
The Lancet Digital Health | VOL. 3
Hannah E Knight, et. al.Hannah E Knight ... Adam Steventon
03 Feb 2021
The Lancet Digital Health | VOL. 3

The art and practice of data science pipelines
Sumon Biswas ... Hridesh Rajan
-
Sumon Biswas, et. al.Sumon Biswas ... Hridesh Rajan
21 May 2022
21 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visualising data science workflows to support third-party notebook comprehension: an empirical study

Abstract

Talk to us

Similar Papers

More From: Empirical Software Engineering