Abstract

High-utility itemset mining (HUIM), which is an extension of well-known frequent itemset mining (FIM), has become a key topic in recent years. HUIM aims to find a complete set of itemsets having high utilities in a given dataset. High average-utility itemset mining (HAUIM) is a variation of traditional HUIM. HAUIM provides an alternative measurement named the average-utility to discover the itemsets by taking into consideration both of the utility values and lengths of itemsets. HAUIM is important for several application domains, such as, business applications, medical data analysis, mobile commerce, streaming data analysis, etc. In the literature, several algorithms have been proposed by introducing their own upper-bound models and data structures to discover high average utility itemsets (HAUIs) in a given database. However, they require long execution times and large memory consumption to handle the problem. To overcome these limitations, this paper, first, introduces four novel upper-bounds along with pruning strategies and two data structures. Then, it proposes a pattern growth approach called the HAUL-Growth algorithm for efficiently mining of HAUIs using the proposed upper-bounds and data structures. Experimental results show that the proposed HAUL-Growth algorithm significantly outperforms the state-of-the-art dHAUIM and TUB-HAUIM algorithms in terms of execution times, number of join operations, memory consumption, and scalability.

Highlights

  • Frequent itemset mining (FIM), which is one of the most well-known techniques to discover relations among items in large data, was originally introduced to discover frequently purchased itemsets by customers [1]–[4]

  • A typical high average-utility itemset mining (HAUIM) approach aims to find a complete set of high average utility itemsets (HAUIs) based on a given minimum utility threshold (minUtil) threshold

  • This study proposes an algorithm named as High Average-Utility List-Growth (HAUL-Growth) algorithm for mining HAUIs efficiently

Read more

Summary

INTRODUCTION

Frequent itemset mining (FIM), which is one of the most well-known techniques to discover relations among items in large data, was originally introduced to discover frequently purchased itemsets by customers [1]–[4]. The problem of high-utility itemset mining (HUIM) [5], [6] was introduced as an extension of FIM to discover more meaningful itemsets by taking into account non-binary attributes of items. Celik: Efficient Tree-Based Algorithm for Mining High Average-Utility Itemset most of the discovered HUIs may contain items with low utilities. To address these limitations, the problem of high average-utility mining (HAUIM) is introduced with a more fair measurement named average-utility [7]. A typical HAUIM approach aims to find a complete set of HAUIs based on a given minUtil threshold This process is computationally complex due to anti-monotonic characteristic of average-utilities of itemsets.

RELATED WORK
PROPOSED DATA STRUCTURES
VIII. CONCLUSION AND FUTURE WORKS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.