Abstract

Sequence data is more commonly seen than other types of data (e.g., transaction data) in real-world applications. For the mining task from sequence data, several problems have been formulated, such as sequential pattern mining, episode mining, and sequential rule mining. As one of the fundamental problems, episode mining has often been studied. The common wisdom is that discovering frequent episodes is not useful enough. In this paper, we propose an efficient utility mining approach, namely UMEpi: <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">U</u> tility <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">M</u> ining of high-utility <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Epi</u> sodes from complex event sequences. We propose the concept of remaining utility of episodes and achieve a tighter upper bound, namely episode-weighted utilization ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EWU</i> ), which will provide better pruning. Thus, the optimized <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EWU</i> -based pruning strategies can achieve better improvements in mining efficiency. The search space of UMEpi w.r.t. a prefix-based lexicographic sequence tree is spanned and determined recursively for mining high-utility episodes, by prefix-spanning in a depth-first way. Finally, extensive experiments on four real-life datasets demonstrate that UMEpi can discover the complete high-utility episodes from complex event sequences. Furthermore, the improved variants of UMEpi significantly outperform the baseline in terms of execution time, memory consumption, and scalability. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Impact Statement</i> —This article contributes to the problem of utility-based episode discovery, which is a challenging task in artificial intelligence and data science. To the best of our knowledge, it is the first article that introduces the concept of the remaining utility of an episode in an event sequence and then formulates an alternative definition of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EWU</i> , which is able to provide an accurate formulation of the upper bound. The designed UMEpi algorithm can be a benchmark for utility-based episode discovery. UMEpi addresses several challenges, and the optimized <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EWU</i> -based pruning strategies can achieve better improvements in mining efficiency on massive datasets. It is well known that episode discovery has been successfully applied in a wide range of real-world applications and various domains. In summary, this correct and efficient utility episode mining algorithm contributes to the artificial intelligence systems in many applications, especially complex event processing and analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.