MapFIM+: Memory Aware Parallelized Frequent Itemset Mining In Very Large Datasets

Khanh-Chuong Duong,Dominique Li,Arnaud Giacometti,Mostafa Bamha,Christel Vrain,Arnaud Soulet

doi:10.1007/978-3-662-58415-6_7

Abstract

Mining frequent itemsets in large datasets has received much attention in recent years relying on MapReduce programming model. For instance, many efficient Frequent Itemset Mining (a.k.a. FIM) algorithms have been parallelized to MapReduce principle such as Parallel Apriori, Parallel FP-Growth and Dist-Eclat. However, most approaches focus on job partitioning and/or load balancing without considering the extensibility depending on required memory assumptions. Thus, a challenge in designing parallel FIM algorithms consists therefore in finding ways to guarantee that data structures used during the mining process always fit in the local memory of processing nodes during all computation steps. In this paper, we propose MapFIM+, a two-phase approach to frequent itemset mining in very large datasets benefiting both from a MapReduce-based distributed Apriori method and local in-memory FIM methods. In our approach, MapReduce is first used to generate frequent itemsets until getting local memory-fitted prefix-projected databases, and an optimized local in-memory mining process is then launched to generate all remaining frequent itemsets from each prefix-projected database on individual processing nodes. Indeed, MapFIM+ improves our previous algorithm MapFIM by using an exact evaluation of prefix-projected database sizes during the MapReduce phase. This improvement makes MapFIM+ more efficient, especially for databases leading to huge candidate sets, by significantly reducing communication and disk I/O costs. Performance evaluation shows that MapFIM+ is more efficient and more extensible than existing MapReduce based frequent itemset mining approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MapFIM+: Memory Aware Parallelized Frequent Itemset Mining In Very Large Datasets

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Supporting efficient and scalable frequent pattern mining
Guimei Liu
-
Guimei LiuGuimei Liu
23 Dec 2014
23 Dec 2014

Performance oriented mining of utility frequent itemsets
A Sakthi Nathiarasan ... M Manikandan
-
A Sakthi Nathiarasan, et. al.A Sakthi Nathiarasan ... M Manikandan
01 Nov 2014
01 Nov 2014

Efficient mining of frequent itemsets in social network data based on MapReduce framework
Zahra Farzanyar ... Nick Cercone
-
Zahra Farzanyar, et. al.Zahra Farzanyar ... Nick Cercone
25 Aug 2013
25 Aug 2013

Stable Periodic Frequent Itemset Mining on Uncertain Datasets
Ruimeng He ... Yuxin Duan
-
Ruimeng He, et. al.Ruimeng He ... Yuxin Duan
13 Aug 2021
13 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MapFIM+: Memory Aware Parallelized Frequent Itemset Mining In Very Large Datasets

Abstract

Talk to us

Similar Papers