Intelligent User Assistance for Automated Data Mining Method Selection

Patrick Zschech,Kai Heinrich,Richard Horn,Christian Janiesch,Daniel Höschele

doi:10.1007/s12599-020-00642-3

Patrick Zschech, Kai Heinrich + Show 3 more

Open Access

https://doi.org/10.1007/s12599-020-00642-3

Copy DOI

Journal: Business & Information Systems Engineering	Publication Date: Mar 18, 2020
Citations: 16	License type: open-access

Affiliation: TU Dresden

Abstract

In any data science and analytics project, the task of mapping a domain-specific problem to an adequate set of data mining methods by experts of the field is a crucial step. However, these experts are not always available and data mining novices may be required to perform the task. While there are several research efforts for automated method selection as a means of support, only a few approaches consider the particularities of problems expressed in the natural and domain-specific language of the novice. The study proposes the design of an intelligent assistance system that takes problem descriptions articulated in natural language as an input and offers advice regarding the most suitable class of data mining methods. Following a design science research approach, the paper (i) outlines the problem setting with an exemplary scenario from industrial practice, (ii) derives design requirements, (iii) develops design principles and proposes design features, (iv) develops and implements the IT artifact using several methods such as embeddings, keyword extractions, topic models, and text classifiers, (v) demonstrates and evaluates the implemented prototype based on different classification pipelines, and (vi) discusses the results’ practical and theoretical contributions. The best performing classification pipelines show high accuracies when applied to validation data and are capable of creating a suitable mapping that exceeds the performance of joint novice assessments and simpler means of text mining. The research provides a promising foundation for further enhancements, either as a stand-alone intelligent assistance system or as an add-on to already existing data science and analytics platforms.

Highlights

Data science and analytics (DSA) projects are generally multidisciplinary and require combined expertise from several areas, such as profound domain knowledge, analytical modeling skills, and experience in collecting and processing data from heterogeneous IT systems (Mikalef and Krogstie 2019)
We collected 60 different real-world problem statements, which are distributed among the three target classes, based on problem descriptions derived from own industrial DSA projects as well as selected data mining (DM) competitions from online platforms such as Kaggle
When gathering the set of problem statements, we paid attention to ensure (i) that the underlying scenarios originated from a wide range of application domains, (ii) that the keywords and key phrases for signalizing a specific class of DM method contained sufficient degree of variability, and (iii) that the descriptions were provided with a varying degree of filling information and noise

Summary

Introduction

Data science and analytics (DSA) projects are generally multidisciplinary and require combined expertise from several areas, such as profound domain knowledge, analytical modeling skills, and experience in collecting and processing data from heterogeneous IT systems (Mikalef and Krogstie 2019). Despite improved tool support, one crucial step still remains a challenging task throughout the DSA implementation process: The mapping between (i) the problem space expressed in the language and the concepts of the domain-specific problem setting, and (ii) the class of generic DM methods providing an algorithmic solution for data-driven decision support (Choinski and Chudziak 2009; Eckert and Ehmke 2017). This step requires a translation that determines the character of the subsequent DSA implementation process and, the success of the whole project (Hogl 2003). The translation is carried out by well-trained DSA experts, who bring the necessary skills to merge both contexts, that is the methodical skills needed for a typical data lifecycle as well as the required business understanding to grasp the underlying problem characteristics and achieve the desired outcome towards economic goals (Debortoli et al 2014; Schumann et al 2016)

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Intelligent User Assistance for Automated Data Mining Method Selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Business & Information Systems Engineering

Lead the way for us

Similar Papers

Cognitive modelling of fighter aircraft process control: a step towards an intelligent on-board assistance system
René Amalberti ... François Deblon
International Journal of Man-Machine Studies | VOL. 36
René Amalberti, et. al.René Amalberti ... François Deblon
01 May 1992
International Journal of Man-Machine Studies | VOL. 36

DataOps Lifecycle with a Case Study in Healthcare
Shaimaa Bahaa ... Hany Harb
International Journal of Advanced Computer Science and Applications | VOL. 14
Shaimaa Bahaa, et. al.Shaimaa Bahaa ... Hany Harb
01 Jan 2023
International Journal of Advanced Computer Science and Applications | VOL. 14

Information and power flow during skill acquisition for the Intelligent Assisting System-IAS
M Buss ... H Hashimoto
-
M Buss, et. al.M Buss ... H Hashimoto
26 Jul 1993
26 Jul 1993

Applying Scrum in Data Science Projects
Jeroen Baijens ... Deniz Iren
-
Jeroen Baijens, et. al.Jeroen Baijens ... Deniz Iren
01 Jun 2020
01 Jun 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Intelligent User Assistance for Automated Data Mining Method Selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Business &amp; Information Systems Engineering

More From: Business & Information Systems Engineering