Abstract
Process-oriented data mining (process mining) uses algorithms and data (in the form of event logs) to construct models that aim to provide insights into organisational processes. The quality of the data (both form and content) presented to the modeling algorithms is critical to the success of the process mining exercise. Cleaning event logs to address quality issues prior to conducting a process mining analysis is a necessary, but generally tedious and ad hoc task. In this paper we describe a set of data quality issues, distilled from our experiences in conducting process mining analyses, commonly found in process mining event logs or encountered while preparing event logs from raw data sources. We show that patterns are used in a variety of domains as a means for describing commonly encountered problems and solutions. The main contributions of this article are in showing that a patterns-based approach is applicable to documenting commonly encountered event log quality issues, the formulation of a set of components for describing event log quality issues as patterns, and the description of a collection of 11 event log imperfection patterns distilled from our experiences in preparing event logs. We postulate that a systematic approach to using such a pattern repository to identify and repair event log quality issues benefits both the process of preparing an event log and the quality of the resulting event log. The relevance of the pattern-based approach is illustrated via application of the patterns in a case study and through an evaluation by researchers and practitioners in the field.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.