Abstract

In this paper, we present a Hadoop implementation of the Apriori algorithm. Using Hadoop’s distributed and parallel MapReduce environment, we present an architecture to mine positive as well as negative association rules in big data using frequent itemset mining and the Apriori algorithm. We also analyze and present the results of a few optimization parameters in Hadoop’s MapReduce environment as it relates to this algorithm. The results are presented based on the number of rules generated as well as the run-time efficiency. We find that, a higher amount of parallelization, which means larger block sizes, will increase the run-time efficiency of the Hadoop implementation of the Apriori algorithm.

Highlights

  • Association rule mining, originally developed by [3], is a well-known data mining technique used to find associations between items or itemsets

  • In this paper we present an architecture for positive as well as negative association rule mining in the big data environment using Hadoop’s MapReduce environment using frequent itemset mining

  • Given the fact that repeated scans of the dataset are needed in the Apriori algorithm, the parallel and distributed structure of Hadoop should be availed of in an optimized way for mining positive as well as negative association rules in big data using the Apriori algorithm

Read more

Summary

Introduction

Association rule mining, originally developed by [3], is a well-known data mining technique used to find associations between items or itemsets. In this paper we present an architecture for positive as well as negative association rule mining in the big data environment using Hadoop’s MapReduce environment using frequent itemset mining. Given the fact that repeated scans of the dataset are needed in the Apriori algorithm, the parallel and distributed structure of Hadoop should be availed of in an optimized way for mining positive as well as negative association rules in big data using the Apriori algorithm.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call