Abstract
Mining high utility itemsets (HUIs) has been an active research topic in data mining in recent years. Existing HUI mining algorithms typically take two steps: generating candidates and identifying utility values of these candidate itemsets. The performance of these algorithms depends on the efficiency of both steps, both of which are usually time-consuming. In this study, we propose an efficient pattern-growth based HUI mining algorithm, called tail-node tree-based high-utility itemset (TNT-HUI) mining. This algorithm avoids the time-consuming candidate generation step, as well as the need of scanning the original dataset multiple times for exact utility values, as supported by a novel tree structure, named the tail-node tree (TN-Tree). The performance of TNT-HUI was evaluated in comparison with state-of-the-art benchmark methods on different datasets. Experimental results showed that TNT-HUI outperformed benchmark algorithms in both execution time and memory use by orders of magnitude. The performance gap is larger for denser datasets and lower thresholds.
Highlights
Pattern discovery from a transactional database has been an important topic in data mining [1,2]
Problem Definition In a transaction dataset, an itemset is a high utility itemset if its utility is not less than a user-specified minimum utility value, where the utility of an item in a transaction is defined as its internal utility multiplied by its external utility
According to Theorem 1, the algorithm TNT-high utility itemsets (HUIs) removes all unpromising items from original transaction itemsets when it creates the tail-node tree (TN-Tree) with transaction itemsets
Summary
Pattern discovery from a transactional database has been an important topic in data mining [1,2]. Without the ability of directly retrieving the exact utility values from the tree, existing pattern-growth-based HUI mining methods need to scan the original dataset to identify HUIs, which required additional passes of data I/O, resulting in much computation overhead. To address this issue, we propose a novel tree structure, called tail-node tree (TN-Tree), from which we can retrieve the exact utility values without re-scanning the original dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.