Abstract
When event logs are large, the time needed to analyze them using process mining techniques can become prohibitive. In this paper, using sampling, we aim to reduce the size of event logs to p-traces, while minimizing the Earth Movers’ Distance (EMD) from the unsampled original event log. We contribute by formalizing log sampling in a canonical form and show its link with the EMD, a metric increasingly used for process mining. Next, we propose three log-sampling algorithms that we evaluate using a collection of 18 event logs from industry. We show that our approach largely reduces the EMD compared to existing sampling strategies. Moreover, we highlight that sampled event logs with low EMDs tend to have better behavioural quality, highlighting the generality of our work.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have