Similarity-Based Approaches for Determining the Number of Trace Clusters in Process Discovery

Pieter De Koninck,Jochen De Weerdt

doi:10.1007/978-3-662-55862-1_2

Abstract

Given the complexity of real-life event logs, several trace clustering techniques have been proposed to partition an event log into subsets with a lower degree of variation. In general, these techniques assume that the number of clusters is known in advance. However, this will rarely be the case in practice. Therefore, this paper presents approaches to determine the appropriate number of clusters in a trace clustering context. In order to fulfil the objective of identifying the most appropriate number of trace clusters, two approaches built on similarity are proposed: a stability- and a separation-based method. The stability-based method iteratively calculates the similarity between clustered versions of perturbed and unperturbed event logs. Alternatively, an approach based on between-cluster dissimilarity, or separation, is proposed. Regarding practical validation, both approaches are tested on multiple real-life datasets to investigate the complementarity of the different components. Our results suggest that both methods are successful in identifying an appropriate number of trace clusters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Similarity-Based Approaches for Determining the Number of Trace Clusters in Process Discovery

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A contextual-based segmentation of compact PolSAR images using Markov Random Field (MRF) model
Jamil Nazarinezhad ... Maryam Dehghani
International Journal of Remote Sensing | VOL. 40
Jamil Nazarinezhad, et. al.Jamil Nazarinezhad ... Maryam Dehghani
11 Oct 2018
International Journal of Remote Sensing | VOL. 40

A generalized automatic clustering algorithm in a multiobjective framework
Sriparna Saha ... Sanghamitra Bandyopadhyay
Applied Soft Computing | VOL. 13
Sriparna Saha, et. al.Sriparna Saha ... Sanghamitra Bandyopadhyay
05 Sep 2012
Applied Soft Computing | VOL. 13

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
Jakrarin Therdphapiyanak ... Krerk Piromsopa
-
Jakrarin Therdphapiyanak, et. al.Jakrarin Therdphapiyanak ... Krerk Piromsopa
01 May 2013
01 May 2013

Mechanisms to improve clustering uncertain data with UKmeans
Chuan-Ming Liu ... Kuan-Teng Liao
Data & Knowledge Engineering | VOL. 116
Chuan-Ming Liu, et. al.Chuan-Ming Liu ... Kuan-Teng Liao
24 May 2018
Data & Knowledge Engineering | VOL. 116

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Similarity-Based Approaches for Determining the Number of Trace Clusters in Process Discovery

Abstract

Talk to us

Similar Papers