FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

Yaling Xun,Xiao Qin,Jifu Zhang

doi:10.1109/tsmc.2015.2437327

Abstract

Existing parallel mining algorithms for frequent itemsets lack a mechanism that enables automatic parallelization, load balancing, data distribution, and fault tolerance on large clusters. As a solution to this problem, we design a parallel frequent itemsets mining algorithm called FiDoop using the MapReduce programming model. To achieve compressed storage and avoid building conditional pattern bases, FiDoop incorporates the frequent items ultrametric tree, rather than conventional FP trees. In FiDoop, three MapReduce jobs are implemented to complete the mining task. In the crucial third MapReduce job, the mappers independently decompose itemsets, the reducers perform combination operations by constructing small ultrametric trees, and the actual mining of these trees separately. We implement FiDoop on our in-house Hadoop cluster. We show that FiDoop on the cluster is sensitive to data distribution and dimensions, because itemsets with different lengths have different decomposition and construction costs. To improve FiDoop’s performance, we develop a workload balance metric to measure load balance across the cluster’s computing nodes. We develop FiDoop-HD, an extension of FiDoop, to speed up the mining performance for high-dimensional data analysis. Extensive experiments using real-world celestial spectral data demonstrate that our proposed solution is efficient and scalable.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Systems, Man, and Cybernetics: Systems	Publication Date: Mar 1, 2016
Citations: 93	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man, and Cybernetics: Systems

Lead the way for us

Similar Papers

Parallel Processing of Frequent Itemset Based on MapReduce Programming Model
Rajshree A Deshmukh ... Bharathi H N
-
Rajshree A Deshmukh, et. al.Rajshree A Deshmukh ... Bharathi H N
01 Sep 2019
01 Sep 2019

Parallel Subdomain-Level DGTD Method With Automatic Load Balancing Scheme With Tetrahedral and Hexahedral Elements
Jiamei Mi ... Donglin Su
IEEE Transactions on Antennas and Propagation | VOL. 69
Jiamei Mi, et. al.Jiamei Mi ... Donglin Su
01 Oct 2020
IEEE Transactions on Antennas and Propagation | VOL. 69

Automatic data distribution and load balancing with space-filling curves: implementationin CONQUEST
V Brázdová ... D R Bowler
Journal of Physics: Condensed Matter | VOL. 20
V Brázdová, et. al.V Brázdová ... D R Bowler
04 Jun 2008
Journal of Physics: Condensed Matter | VOL. 20

HDFS framework for efficient frequent itemset mining using MapReduce
Prajakta G Kulkarni ... Shraddha R Khonde
-
Prajakta G Kulkarni, et. al.Prajakta G Kulkarni ... Shraddha R Khonde
01 Oct 2017
01 Oct 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man, and Cybernetics: Systems