Efficient high average-utility itemset mining using novel vertical weak upper-bounds

Tin Truong,Hai Duong,Bac Le,Philippe Fournier-Viger,Unil Yun

doi:10.1016/j.knosys.2019.07.018

Abstract

Discovering high average utility itemsets (HAUIs) in a quantitative database is a popular data mining task, which aims at identifying sets of products (items) purchased together that have a high importance or yield a high profit. However, a key challenge of HAUI mining is that the downward-closure property (DCP) does not hold for the average-utility measure. Many studies have designed average-utility upper-bounds (UBs), which satisfy the DCP, to overestimate the average-utilities of itemsets. Several data structures and algorithms were proposed based on those UBs to discover all HAUIs. However, the proposed UBs are still loose, and the designed structures are still inefficient for computing UBs. To address this issue, this paper proposes four novel weak upper-bounds (WUBs) on the average-utility based on a vertical database representation, two new efficient pruning and tightening strategies to eliminate unpromising candidates early, and a novel list structure named NVUV-list to represent the search space and calculate the proposed WUBs quickly. The novel WUBs are tighter than well-known UBs and WUBs used in state-of-the-art algorithms such as MHAI, TUB-HAUPM and dHAUIM for mining HAUIs. Moreover, the NVUV-list structure is shown to be more efficient for computing vertical WUBs than the IDUL structure employed by dHAUIM. Based on the novel WUBs and NVUV-list structure, a novel algorithm named VMHAUI is proposed for HAUI mining. Experimental results show that it outperforms the state-of-the-art MHAI, TUB-HAUPM and dHAUIM algorithms by one to two orders of magnitudes, and that VMHAUI consumes less memory and performs much less join operations.

Full Text