Abstract
With the rapid growth of data scale and diversification of demand, people have an urgent desire to extract useful frequent itemset from datasets of different scales. It is no doubt that the traditional method can solve the problem. However, the relationships among datasets of different scales are not fully utilized. A fast approach proposed in this paper is as follows: the frequent itemsets on the large-scale data are directly inferred based on the frequent itemsets that are belonged small-scale datasets, instead of mined from the large-scale dataset again on condition that the frequent itemsets on the small-scale datasets have been mined. We conduct extensive experiments on one synthetic data and four UCI data sets. The experimental results show that our algorithm is significantly faster and consumes less memory than these leading algorithms.
Highlights
To analyze customer’s buying behavior-based transactions database, Agrawal et al first presented frequent itemset mining in 1993 [1], that is one of the critical data mining tasks and has widely used in many other significant data mining tasks including mining associations and correlations, classifying, clustering, etc
The contributions of this paper are listed as follows: 1) This paper presents a novel framework for addressing the issue that one mines frequent itemsets from different scale datasets
We introduce the method(up-scaling) that computes frequent itemsets of the large-scale dataset depending on the frequent itemsets which belonged to small-scale datasets, not original data
Summary
To analyze customer’s buying behavior-based transactions database, Agrawal et al first presented frequent itemset mining in 1993 [1], that is one of the critical data mining tasks and has widely used in many other significant data mining tasks including mining associations and correlations, classifying, clustering, etc. After Apriori proposed, there are several improved algorithms because Apriori needs to scan the database repeatedly. These algorithms have a common feature: generating candidate itemsets. FP-growth algorithm is a classic representative that does not generate candidate itemsets and compresses the database representing frequent items into FP-tree, which retains the itemset association information [2]. To enhance the efficiency of mining frequent itemset, three kinds of the data structure are presented by Deng et al, named Node-list, N-list, and Nod-eset. Despite the above advantage of Nodeset, two data structures (DiffNodeset [3] and NegNodeset [4]) are proposed by Deng et al and Aryabarzan et al, and there are two algorithms named dFIN and negFIN based the former
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.