Abstract

In real-life applications, data mining task involves extracting valuable but hidden information from massive data. How to effectively find out interesting patterns from large databases is a current topic. Sequential pattern mining is the most popular approach in data mining domain. Traditional sequential pattern mining research generally focuses on discovering frequent sequential patterns. However, the account of occurrence times of patterns does not adequately indicate their importance. For instance, frequent patterns (e.g., pencil and eraser) are not profitable, whereas infrequent patterns (e.g., extreme weather) are high-risk. To extract more useful information, researchers study a weighted sequential pattern mining task. In this paper, an efficient algorithm for weighted sequential pattern mining task, called EWSPM, is proposed. Two new strict upper bounds, namely MWEbound and MSRIWbound, are designed based on the concepts of maximum weight estimation (simplified as MWE) and maximum sumation of remaining item weights (simplified as MSRIW), respectively. These upper bounds achieve better pruning effects and reduce the size of search space during the mining process, which significantly shortens execution time. In addition, a database-projection method is employed to optimize memory usage. It addresses potential memory explosion issues in a certain degree. Finally, we also conducted extensive experiments on nine datasets (including real and synthetic). The experimental results demonstrate that the EWSPM algorithm is capable of mining all interesting patterns efficiently, with the smallest size of search space. Additionally, the novel algorithm also exhibits superior performance in terms of execution time and memory consumption.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call