Abstract

Mining all frequent high utility sequences (FHUS) in quantitative sequential databases (QSDBs) is a generalization of the problem of mining all frequent sequences in non-quantitative sequence databases. In the last decade, the former problem has attracted the attention of many researchers because utility-based sequences are more informative and actionable for decision-making than frequent sequences. Although utility-based sequences have many real-life applications, their number is often very large, especially for low minimum utility thresholds and long sequences. It can thus be difficult for users to analyze them and mining utility-based sequences often requires much time and memory. To solve this problem, this paper proposes two concise representations of FHUS, having a small cardinality that provides a concise summary of all FHUS. Those representations are defined as two sets, FCHUS and FMaxHUS, of all frequent maximal and closed high utility sequences. To efficiently mine these concise representations, two width and depth pruning strategies are proposed for eliminating low utility sequences early and a novel local pruning strategy is proposed named LPCHUS using a new extended measure on projected databases for eliminating non-closed and non-maximal high utility sequences early as well as their extensions. Based on these strategies and a novel data structure named SIDUL in vertical format, an algorithm named FMaxCloHUSM is designed for efficiently mining the sets of FCHUS and FMaxHUS, separately or simultaneously. To our best knowledge, this is the first algorithm for discovering these two concise representations. An experimental study conducted using both real-life and synthetic QSDBs shows that the proposed algorithm is efficient in terms of time and memory consumption, and that the developed strategies greatly reduce the search space.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call