Abstract

Process mining is one of business process management techniques which is used to extract values from process execution logs. Process discovery algorithms, like alpha and heuristic miners, are used to automatically discover/rebuild business process models from event logs. However, the performance of these techniques is limited when dealing with Big Data. To cope with this issue, we propose a distributed implementation, based on Spark framework, of the alpha and heuristic algorithms to support efficient scalable process discovery for big process data. The approach consists of distributing the CPU intensive phases, such as the construction of the causality matrix related to these algorithms. Experimental results show that the proposed algorithms speed-up and scale-up well with regard to the variation of both data size and the number of nodes in the cluster.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call