Abstract

Exploratory data analysis (EDA) is an iterative process where data scientists interact with data to extract information about their quality and shape as well as derive knowledge and new insights into the related domain of the dataset. However, data scientists are rarely experienced domain experts who have tangible knowledge about a domain. Integrating domain knowledge into the analytic process is a complex challenge that usually requires constant communication between data scientists and domain experts. For this reason, it is desirable to reuse the domain insights from exploratory analyses in similar use cases. With this objective in mind, we present a conceptual system design on how to extract domain expertise while performing EDA and utilize it to guide other data scientists in similar use cases. Our system design introduces two concepts, interaction storage and analysis context storage, to record user interaction and interesting data points during an exploratory analysis. For new use cases, it identifies historical interactions from similar use cases and facilitates the recorded data to construct candidate interaction sequences and predict their potential insight—i.e., the insight generated from performing the sequence. Based on these predictions, the system recommends the sequences with the highest predicted insight to data scientist. We implement a prototype to test the general feasibility of our system design and enable further research in this area. Within the prototype, we present an exemplary use case that demonstrates the usefulness of recommended interactions. Finally, we give a critical reflection of our first prototype and discuss research opportunities resulting from our system design.

Highlights

  • Since the beginning of knowledge discovery in databases, one of the most common tasks of a data scientist is to find interesting insights and discover knowledge in datasets by performing exploratory data analysis (EDA) [1]

  • It is desirable to reuse the domain insights from exploratory analyses in similar use cases. With this objective in mind, we present a conceptual system design on how to extract domain expertise while performing Exploratory data analysis (EDA) and utilize it to guide other data scientists in similar use cases

  • We present a system design to utilize domain expertise during EDA to increase the quality of findings and to speed up the time to gain insights

Read more

Summary

Introduction

Since the beginning of knowledge discovery in databases, one of the most common tasks of a data scientist is to find interesting insights and discover knowledge in datasets by performing exploratory data analysis (EDA) [1] During this process, the scientist examines the data to find patterns and trends, detect anomalies (e.g., outliers) and check the validity of hypotheses within the data. Several similar EDA recommender systems (e.g., [2,3,4]) and visionary systems for fully automated EDA (e.g., [5,6]) have been proposed Those approaches focus on EDA tools that provide extensive means to statistically analyze and visualize a dataset to discover insightful views.

Problem Description
Related Work
Systems to Ease the Visualization of Data
Systems to Recommend Statistically Interesting Views on Datasets
Systems to Recommend Statistically Relevant Next Steps during EDA
Systems to Track and View Interaction Data
Recommender Systems
System Workflow
Record EDA Sessions
Evaluate EDA Sessions
Gather Recommendation Data
Provide Recommendations
Proof of Concept System
Exemplary Use Case
Critical Reflection and Research Opportunities
Findings
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call