Abstract
Exploratory data analysis (EDA) is an iterative process where data scientists interact with data to extract information about their quality and shape as well as derive knowledge and new insights into the related domain of the dataset. However, data scientists are rarely experienced domain experts who have tangible knowledge about a domain. Integrating domain knowledge into the analytic process is a complex challenge that usually requires constant communication between data scientists and domain experts. For this reason, it is desirable to reuse the domain insights from exploratory analyses in similar use cases. With this objective in mind, we present a conceptual system design on how to extract domain expertise while performing EDA and utilize it to guide other data scientists in similar use cases. Our system design introduces two concepts, interaction storage and analysis context storage, to record user interaction and interesting data points during an exploratory analysis. For new use cases, it identifies historical interactions from similar use cases and facilitates the recorded data to construct candidate interaction sequences and predict their potential insight—i.e., the insight generated from performing the sequence. Based on these predictions, the system recommends the sequences with the highest predicted insight to data scientist. We implement a prototype to test the general feasibility of our system design and enable further research in this area. Within the prototype, we present an exemplary use case that demonstrates the usefulness of recommended interactions. Finally, we give a critical reflection of our first prototype and discuss research opportunities resulting from our system design.
Highlights
Since the beginning of knowledge discovery in databases, one of the most common tasks of a data scientist is to find interesting insights and discover knowledge in datasets by performing exploratory data analysis (EDA) [1]
It is desirable to reuse the domain insights from exploratory analyses in similar use cases. With this objective in mind, we present a conceptual system design on how to extract domain expertise while performing Exploratory data analysis (EDA) and utilize it to guide other data scientists in similar use cases
We present a system design to utilize domain expertise during EDA to increase the quality of findings and to speed up the time to gain insights
Summary
Since the beginning of knowledge discovery in databases, one of the most common tasks of a data scientist is to find interesting insights and discover knowledge in datasets by performing exploratory data analysis (EDA) [1] During this process, the scientist examines the data to find patterns and trends, detect anomalies (e.g., outliers) and check the validity of hypotheses within the data. Several similar EDA recommender systems (e.g., [2,3,4]) and visionary systems for fully automated EDA (e.g., [5,6]) have been proposed Those approaches focus on EDA tools that provide extensive means to statistically analyze and visualize a dataset to discover insightful views.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.