Abstract
Massive amounts of business process event logs are collected and stored by modern information systems. Numerous process discovery approaches have been proposed to extract descriptive process models from such event logs in the past decades. To improve process discovery efficiency, event log sampling techniques are proposed. A sample log is a delicately selected subset of the original log that requires less computational cost. However, existing sampling techniques have difficulties, e.g., low efficiency, in handling large-scale event logs. To tackle this challenge, we propose a novel ranking-based event log sampling approach, denoted as \( LogRank^+ \), to support efficient sampling. In addition, we introduce a framework to evaluate the effectiveness of different sampling techniques by quantifying the sampling efficiency and the quality of sample logs. The proposed sampling approach has been implemented in the open-source process mining toolkit ProM. Experimental evaluation with both synthetic and real-life event logs demonstrates that the proposed sampling approach provides an effective solution to improve event log sampling efficiency as well as ensuring high quality of the obtained sample logs from a process discovery perspective.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.