An efficient method for mining high occupancy itemsets based on equivalence class and early pruning

Loan T.T Nguyen,Thang Mai,Giao-Huy Pham,Unil Yun,Bay Vo

doi:10.1016/j.knosys.2023.110441

Abstract

Many researchers have been investigating and applying a new trend of data mining, namely high occupancy itemset mining. Frequent itemset mining often returns a large set of itemsets, but businesses need a smaller set of inputs to investigate or send into a recommendation system to quickly make decisions. Applying an occupancy measure to a support-based mining framework will thus bring many benefits for decision support systems, while managers will benefit by having a new method to visualize reports and analyze data more efficiently. Similar to frequent itemset mining, mining high occupancy itemsets can be applied on any transaction database. In this research, we apply additional conditions to eliminate unqualified itemsets and integrate the property of equivalence class to reduce the runtime of the k-itemsets generation process. Moreover, a new theorem is stated and applied to a specific class of databases so that it is not necessary to calculate the upper-bound occupancy, and this speeds up the process as well as reduces memory requirements with regard to generating high occupancy itemsets. We develop two new algorithms, fast high occupancy itemset mining (FHOI) and depth first search (DFS) for high occupancy itemset mining (DFHOI) to solve the problem. Our new algorithms are examined experimentally using different databases to evaluate its performance in term of runtime and memory usage.

Full Text