Abstract

Mining high utility itemsets (HUIs) is one of the most important research topics in data mining because HUIs consider non-binary frequency values of items in transactions and different profit values for each item. However, setting appropriate minimum utility thresholds by trial and error is a tedious process for users. Thus, mining the top-k high utility itemsets (top-k HUIs) without setting a utility threshold is becoming an alternative to determining all of the HUIs. In this paper, we propose a novel algorithm, named TKU-CE (Top-K high Utility mining based on Cross-Entropy method), for mining top-k HUIs. The TKU-CE algorithm follows the roadmap of cross entropy and tackles top-k HUI mining using combinatorial optimization. The main idea of TKU-CE is to generate the top-k HUIs by gradually updating the probabilities of itemsets with high utility values. Compared with the state-of-the-art algorithms, TKU-CE is not only easy to implement, but also saves computational costs incurred by additional data structures, threshold raising strategies, and pruning strategies. Extensive experimental results show that the TKU-CE algorithm is efficient, memory-saving, and can discover most actual top-k HUIs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call