New and Efficient Algorithms for Producing Frequent Itemsets with the Map-Reduce Framework

Yaron Gonen,Kirill Kandalov,Ehud Gudes

doi:10.3390/a11120194

Yaron Gonen, Kirill Kandalov + Show 1 more

Open Access

https://doi.org/10.3390/a11120194

Copy DOI

Abstract

The Map-Reduce (MR) framework has become a popular framework for developing new parallel algorithms for Big Data. Efficient algorithms for data mining of big data and distributed databases has become an important problem. In this paper we focus on algorithms producing association rules and frequent itemsets. After reviewing the most recent algorithms that perform this task within the MR framework, we present two new algorithms: one algorithm for producing closed frequent itemsets, and the second one for producing frequent itemsets when the database is updated and new data is added to the old database. Both algorithms include novel optimizations which are suitable to the MR framework, as well as to other parallel architectures. A detailed experimental evaluation shows the effectiveness and advantages of the algorithms over existing methods when it comes to large distributed databases.

Highlights

The amount of information generated in our world has grown in the last few decades at an exponential rate
A pre-requisite to finding association rules is the mining of frequent itemsets (FIM)
At the end of each iteration, the candidates are pruned by their count in the DB and those who survive are added to the final set of frequent itemsets

Summary

Introduction

The amount of information generated in our world has grown in the last few decades at an exponential rate. One of the common tools that is in use today is the Map-Reduce (MR) framework [1] It was originally developed by Google, but currently the most researched version is an open source project called Hadoop [2]. One of the most well-known algorithms for association rules is the Apriori algorithm described in [5,6] This algorithm uses a pruning rule called Apriori, which states that an itemset may be frequent if all its subsets are frequent. The algorithm is based on iteratively generating candidates for frequent itemsets and pruning them. The algorithm stops when it cannot generate longer candidates, and it generates all association rules from the frequent itemsets

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

New and Efficient Algorithms for Producing Frequent Itemsets with the Map-Reduce Framework

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Journal: Algorithms	Publication Date: Nov 28, 2018
License type: CC BY 4.0

Similar Papers

Large-scale data mining analytics based on MapReduce

-

01 Jan 2014
01 Jan 2014

A Novel Nodesets-Based Frequent Itemset Mining Algorithm for Big Data using MapReduce
Borra Sivaiah ... Ramisetty Rajeswara Rao
International journal of electrical and computer engineering systems | VOL. 14
Borra Sivaiah, et. al.Borra Sivaiah ... Ramisetty Rajeswara Rao
14 Nov 2023
International journal of electrical and computer engineering systems | VOL. 14

Sequence-Growth: A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework
Yen-Hui Liang ... Shiow-Yang Wu
-
Yen-Hui Liang, et. al.Yen-Hui Liang ... Shiow-Yang Wu
01 Jun 2015
01 Jun 2015

A parallel algorithm for approximate frequent itemset mining using MapReduce
Fabio Fumarola ... Donato Malerba
-
Fabio Fumarola, et. al.Fabio Fumarola ... Donato Malerba
01 Jul 2014
01 Jul 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

New and Efficient Algorithms for Producing Frequent Itemsets with the Map-Reduce Framework

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms