Apriori-based frequent itemset mining algorithms on MapReduce

Ming-Yen Lin,Pei-Yu Lee,Sue-Chen Hsueh

doi:10.1145/2184751.2184842

Abstract

Many parallelization techniques have been proposed to enhance the performance of the Apriori-like frequent itemset mining algorithms. Characterized by both map and reduce functions, MapReduce has emerged and excels in the mining of datasets of terabyte scale or larger in either homogeneous or heterogeneous clusters. Minimizing the scheduling overhead of each map-reduce phase and maximizing the utilization of nodes in each phase are keys to successful MapReduce implementations. In this paper, we propose three algorithms, named SPC, FPC, and DPC, to investigate effective implementations of the Apriori algorithm in the MapReduce framework. DPC features in dynamically combining candidates of various lengths and outperforms both the straight-forward algorithm SPC and the fixed passes combined counting algorithm FPC. Extensive experimental results also show that all the three algorithms scale up linearly with respect to dataset sizes and cluster sizes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Apriori-based frequent itemset mining algorithms on MapReduce

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A review on Frequent Itemset Mining algorithms in social network data
Ankit N Dharsandiya ... Mihir R Patel
-
Ankit N Dharsandiya, et. al.Ankit N Dharsandiya ... Mihir R Patel
01 Mar 2016
01 Mar 2016

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream
Guangtao Wang ... Gao Cong
ACM Transactions on Knowledge Discovery from Data | VOL. 16
Guangtao Wang, et. al.Guangtao Wang ... Gao Cong
21 Jul 2021
ACM Transactions on Knowledge Discovery from Data | VOL. 16

A new algorithm for fast mining frequent itemsets using N-lists
Zhihong Deng ... Zhonghui Wang
Science China Information Sciences | VOL. 55
Zhihong Deng, et. al.Zhihong Deng ... Zhonghui Wang
19 Jul 2012
Science China Information Sciences | VOL. 55

Observations on factors affecting performance of MapReduce based Apriori on Hadoop cluster
Sudhakar Singh ... Rakhi Garg
-
Sudhakar Singh, et. al.Sudhakar Singh ... Rakhi Garg
01 Apr 2016
01 Apr 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Apriori-based frequent itemset mining algorithms on MapReduce

Abstract

Talk to us

Similar Papers