Abstract

Abstract Frequent itemset discovery is an important step in Association Rule Mining. The Frequent Pattern (FP) growth algorithm, often used for discovering frequent itemsets, cannot scale directly to today’s Big Data, especially for large sparse datasets. Hence there is a need to distribute and parallelize the FP-growth algorithm. Parallel FP-growth (PFP) is a parallel implementation of the FP-growth algorithm on Hadoop’s MapReduce execution framework. Though PFP scales to large datasets, it suffers from imbalanced load across processing units. In this paper we propose a heuristic based, lower order of complexity, load balancing strategy for the PFP algorithm, called Heuristic Based PFP (HBPFP). Our results show that HBPFP distributes the load more evenly across the Hadoop cluster nodes, runs faster than the PFP algorithm, and uses cluster resources more efficiently, especially for large sparse datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.