Abstract

Approximate frequent itemsets (AFI) mining from noisy databases are computationally more expensive than traditional frequent itemset mining. This is because the AFI mining algorithms generate large number of candidate itemsets. This article proposes an algorithm to mine AFIs using pattern growth approach. The major contribution of the proposed approach is it mines core patterns and examines approximate conditions of candidate AFIs directly with single phase and two full scans of database. Related algorithms apply Apriori-based candidate generation and test approach and require multiple phases to obtain complete AFIs. First phase generates core patterns, and second phase examines approximate conditions of core patterns. Specifically, the article proposes novel techniques that how to map transactions on approximate FP-tree, and how to mine AFIs from the conditional patterns of approximate FP-tree. The approximate FP-tree maps transactions on shared branches when the transactions share a similar set of items. This reduces the size of databases and helps to efficiently compute the approximate conditions of candidate itemsets. We compare the performance of our algorithm with the state of the art AFI mining algorithms on benchmark databases. The experiments are analyzed by comparing the processing time of algorithms and scalability of algorithms on varying database size and transaction length. The results show pattern growth approach mines AFIs in less processing time than related Apriori-based algorithms.

Highlights

  • Mining frequent itemsets from databases is an important data mining task

  • Our proposed algorithm mines approximate frequent itemsets (AFI) using the concept of core patterns [10,11] by exploring complete search space of candidate itemsets

  • We analyze the performance of AFI mining algorithms with the following three aspects. _ In first aspect, we compare all algorithms in term of how much processing time the algorithms consume for mining complete set of AFIs. _ In second aspect, we compare the performance of algorithms on varying database size

Read more

Summary

Introduction

Mining frequent itemsets from databases is an important data mining task. It has many practical applications including document clustering [15, 40], social network analysis [23, 34], market basked analysis [17], fraud detection [14], bioinformatics [13, 28, 33], mining patterns from web logs [22, 38]. The transactions 10, 20, and 50 contain three out of four items of efcb and every single item of (efcb) is appeared in at least two transactions (10, 20, and 50) This approximate match mining concept is appealing in this way that it discovers long length frequent itemsets. _ To examine AFI conditions of core patterns, the Apriori-based algorithm scans the original database multiple times for calculating supports of itemsets and items.

Related Work
Design and Construction
Mining Approximate Frequent Itemsets from Apx-FP-tree
Experiments
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.