Discovering High Utility Episodes in Sequences

Wensheng Gan,Jerry Chun-Wei Lin,Philip S Yu,Han-Chieh Chao

doi:10.1109/tai.2022.3223965

Abstract

Sequence data is more commonly seen than other types of data (e.g., transaction data) in real-world applications. For the mining task from sequence data, several problems have been formulated, such as sequential pattern mining, episode mining, and sequential rule mining. As one of the fundamental problems, episode mining has often been studied. The common wisdom is that discovering frequent episodes is not useful enough. In this paper, we propose an efficient utility mining approach, namely UMEpi: <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">U tility <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">M ining of high-utility <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Epi sodes from complex event sequences. We propose the concept of remaining utility of episodes and achieve a tighter upper bound, namely episode-weighted utilization ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EWU ), which will provide better pruning. Thus, the optimized <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EWU -based pruning strategies can achieve better improvements in mining efficiency. The search space of UMEpi w.r.t. a prefix-based lexicographic sequence tree is spanned and determined recursively for mining high-utility episodes, by prefix-spanning in a depth-first way. Finally, extensive experiments on four real-life datasets demonstrate that UMEpi can discover the complete high-utility episodes from complex event sequences. Furthermore, the improved variants of UMEpi significantly outperform the baseline in terms of execution time, memory consumption, and scalability. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Impact Statement —This article contributes to the problem of utility-based episode discovery, which is a challenging task in artificial intelligence and data science. To the best of our knowledge, it is the first article that introduces the concept of the remaining utility of an episode in an event sequence and then formulates an alternative definition of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EWU , which is able to provide an accurate formulation of the upper bound. The designed UMEpi algorithm can be a benchmark for utility-based episode discovery. UMEpi addresses several challenges, and the optimized <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EWU -based pruning strategies can achieve better improvements in mining efficiency on massive datasets. It is well known that episode discovery has been successfully applied in a wide range of real-world applications and various domains. In summary, this correct and efficient utility episode mining algorithm contributes to the artificial intelligence systems in many applications, especially complex event processing and analysis.

Full Text