Abstract

The Map-Reduce (MR) framework has become a popular framework for developing new parallel algorithms for Big Data. Efficient algorithms for data mining of big data and distributed databases has become an important problem. In this paper we focus on algorithms producing association rules and frequent itemsets. After reviewing the most recent algorithms that perform this task within the MR framework, we present two new algorithms: one algorithm for producing closed frequent itemsets, and the second one for producing frequent itemsets when the database is updated and new data is added to the old database. Both algorithms include novel optimizations which are suitable to the MR framework, as well as to other parallel architectures. A detailed experimental evaluation shows the effectiveness and advantages of the algorithms over existing methods when it comes to large distributed databases.

Highlights

  • The amount of information generated in our world has grown in the last few decades at an exponential rate

  • A pre-requisite to finding association rules is the mining of frequent itemsets (FIM)

  • At the end of each iteration, the candidates are pruned by their count in the DB and those who survive are added to the final set of frequent itemsets

Read more

Summary

Introduction

The amount of information generated in our world has grown in the last few decades at an exponential rate. One of the common tools that is in use today is the Map-Reduce (MR) framework [1] It was originally developed by Google, but currently the most researched version is an open source project called Hadoop [2]. One of the most well-known algorithms for association rules is the Apriori algorithm described in [5,6] This algorithm uses a pruning rule called Apriori, which states that an itemset may be frequent if all its subsets are frequent. The algorithm is based on iteratively generating candidates for frequent itemsets and pruning them. The algorithm stops when it cannot generate longer candidates, and it generates all association rules from the frequent itemsets

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.