Abstract

Mining high utility itemsets (HUIs) has been an active research topic in data mining in recent years. Existing HUI mining algorithms typically take two steps: generating candidates and identifying utility values of these candidate itemsets. The performance of these algorithms depends on the efficiency of both steps, both of which are usually time-consuming. In this study, we propose an efficient pattern-growth based HUI mining algorithm, called tail-node tree-based high-utility itemset (TNT-HUI) mining. This algorithm avoids the time-consuming candidate generation step, as well as the need of scanning the original dataset multiple times for exact utility values, as supported by a novel tree structure, named the tail-node tree (TN-Tree). The performance of TNT-HUI was evaluated in comparison with state-of-the-art benchmark methods on different datasets. Experimental results showed that TNT-HUI outperformed benchmark algorithms in both execution time and memory use by orders of magnitude. The performance gap is larger for denser datasets and lower thresholds.

Highlights

  • Pattern discovery from a transactional database has been an important topic in data mining [1,2]

  • Problem Definition In a transaction dataset, an itemset is a high utility itemset if its utility is not less than a user-specified minimum utility value, where the utility of an item in a transaction is defined as its internal utility multiplied by its external utility

  • According to Theorem 1, the algorithm TNT-high utility itemsets (HUIs) removes all unpromising items from original transaction itemsets when it creates the tail-node tree (TN-Tree) with transaction itemsets

Read more

Summary

Introduction

Pattern discovery from a transactional database has been an important topic in data mining [1,2]. Without the ability of directly retrieving the exact utility values from the tree, existing pattern-growth-based HUI mining methods need to scan the original dataset to identify HUIs, which required additional passes of data I/O, resulting in much computation overhead. To address this issue, we propose a novel tree structure, called tail-node tree (TN-Tree), from which we can retrieve the exact utility values without re-scanning the original dataset.

Apriori-Based HUI Mining Algorithms
Pattern-Growth-Based HUI Mining Algorithms
Basic Concepts
TN-Tree for HUI Mining
The Structure of TN-Tree
TN-Tree Construction
Important Concepts about Sub Trees
Algorithm Description
Comparison with Existing HUI Mining Algorithms
Experimental Results
Evaluation of Computational Efficiency
Evaluation of Memory Usage
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call