Abstract

The Cross-Industry Standard Process for Data Mining (CRISP-DM), despite being the most popular data mining process for more than two decades, is known to leave those organizations lacking operational data mining experience puzzled and unable to start their data mining projects. This is especially apparent in the first phase of Business Understanding, at the conclusion of which, the data mining goals of the project at hand should be specified, which arguably requires at least a conceptual understanding of the knowledge discovery process. We propose to bridge this knowledge gap from a Data Science perspective by applying Natural Language Processing techniques (NLP) to the organizations’ e-mail exchange repositories to extract explicitly stated business goals from the conversations, thus bootstrapping the Business Understanding phase of CRISP-DM. Our NLP-Automated Method for Business Understanding (NAMBU) generates a list of business goals which can subsequently be used for further specification of data mining goals. The validation of the results on the basis of comparison to the results of manual business goal extraction from the Enron corpus demonstrates the usefulness of our NAMBU method when applied to large datasets.

Highlights

  • The NLPAutomated Method for Business Understanding (NAMBU) business goal extraction method was developed by further building upon previous approaches to automatic goal identification found in scientific literature

  • Even in case of a business goal not being directly translatable to a data mining goal, not working within the business understanding phase of CRISP-Data Mining (DM), the method can be applied within a wider context of the overall knowledge management or even master data management of the enterprise

  • This paper presents a novel approach of applying natural language processing techniques to corporate e-mail repositories to facilitate the identification and formulation of an organization’s business goals

Read more

Summary

Introduction

CRISP-DM, being the most popular method and the de facto worldwide standard for data mining [1,2] provides a roadmap for DM projects as illustrated, by specifying their individual phases—Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment—and going deeper into particular tasks. It is widely known that CRISP-DM does not provide sufficient support for the first step of the data mining process, namely the understanding of the problem owner’s (i.e., business insider’s) concerns [3]. This step, which is referred to as the Business Understanding phase within

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.