Mining High Utility Itemsets Using Prefix Trees and Utility Vectors

Jun-Feng Qu,Philippe Fournier-Viger,Mengchi Liu,Chunyang Hu,Bo Hang

doi:10.1109/tkde.2023.3256126

Abstract

High utility itemsets can reveal combinations of items that have a high profit, expense, or importance. Mining high utility itemsets in a database with <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$n$</tex-math></inline-formula> items generally results in a huge search space, composed of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$2^{n}$</tex-math></inline-formula> itemsets, and heavy utility calculations for the explored itemsets. Previous algorithms using prefix tree structures perform two phases, namely candidate generation and testing. To avoid generating candidate itemsets, one-phase algorithms use list or hyper-link structures and have been proven to be superior to two-phase algorithms. However, it should be noted that a prefix tree is still an efficient structure for itemset mining problems, and especially algorithms using prefix trees such as FP-Growth have shown excellent performance for mining frequent itemsets. This paper proposes Hamm, a High-performance AlgorithM for Mining high utility itemsets. Hamm employs a novel TV (prefix Tree and utility Vector) structure and mines high utility itemsets in one phase without candidate generation. We also develop an efficient optimization which is incorporated into Hamm as a component. Using prefix trees and utility vectors, Hamm outperforms state-of-the-art algorithms on various databases in experiments. Experimental results also show that the proposed optimization remarkably reduces the search space and speeds up Hamm.

Full Text