A heuristic approach for load balancing the FP-growth algorithm on MapReduce

Sikha Bagui,Keerthi Devulapalli,John Coffey

doi:10.1016/j.array.2020.100035

Abstract

Abstract Frequent itemset discovery is an important step in Association Rule Mining. The Frequent Pattern (FP) growth algorithm, often used for discovering frequent itemsets, cannot scale directly to today’s Big Data, especially for large sparse datasets. Hence there is a need to distribute and parallelize the FP-growth algorithm. Parallel FP-growth (PFP) is a parallel implementation of the FP-growth algorithm on Hadoop’s MapReduce execution framework. Though PFP scales to large datasets, it suffers from imbalanced load across processing units. In this paper we propose a heuristic based, lower order of complexity, load balancing strategy for the PFP algorithm, called Heuristic Based PFP (HBPFP). Our results show that HBPFP distributes the load more evenly across the Hadoop cluster nodes, runs faster than the PFP algorithm, and uses cluster resources more efficiently, especially for large sparse datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Array	Publication Date: Aug 8, 2020
Citations: 13	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

A heuristic approach for load balancing the FP-growth algorithm on MapReduce

Abstract

Talk to us

Similar Papers

More From: Array

Lead the way for us

Similar Papers

A Parallel FP-Growth Mining Algorithm with Load Balancing Constraints for Traffic Crash Data
Yang Yang ... Zhenzhou Yuan
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL | VOL. 17
Yang Yang, et. al.Yang Yang ... Zhenzhou Yuan
20 Jul 2022
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL | VOL. 17

Extraction of association rules in a diabetic dataset using parallel FP-growth algorithm under apache spark
Youssef Fakir ... Mohamed Fakir
International Journal of Informatics and Communication Technology (IJ-ICT) | VOL. 13
Youssef Fakir, et. al.Youssef Fakir ... Mohamed Fakir
01 Dec 2024
International Journal of Informatics and Communication Technology (IJ-ICT) | VOL. 13

A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data
Dawen Xia ... Yantao Li
Complexity | VOL. 2018
Dawen Xia, et. al.Dawen Xia ... Yantao Li
01 Jan 2018
Complexity | VOL. 2018

A Combined Horizontal Parallel Apriori Algorithm and Adaptive Frequent Pattern Growth Algorithm for Big Data Mining
-
International Journal of Innovative Technology and Exploring Engineering | VOL. 9
--
30 Dec 2020
International Journal of Innovative Technology and Exploring Engineering | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A heuristic approach for load balancing the FP-growth algorithm on MapReduce

Abstract

Talk to us

Similar Papers

More From: Array