DisCANTree: A Distributed Algorithm for Incremental Frequent Itemset Mining based on MapReduce

Wen Xiao,Juan Hu

doi:10.1088/1742-6596/1682/1/012022

Abstract

Frequent itemset mining is one of the most important data mining tasks. Classical frequent itemset mining algorithms need to store data in a centralized way and run in a batch way, which cannot meet the requirements of fast updating big data mining. In this paper, we propose a distributed incremental frequent itemset mining algorithm, DisCANTree, which uses CANTree to store the conditional database, achieves the load balance between nodes by grouping all items, updates the new transaction to the existing CANTree to avoid the load of tree reconstruction, and uses the efficient FPGrowth algorithm to mine CANTree to generate frequent itemsets. The popular distributed programming model MapReduce and its open source system Hadoop are used to implement the DisCANTree algorithm. The experimental results show that the DisCANTree algorithm has more advantages than the most popular PFP algorithm in performance as well as the number of transferred records between nodes, and especially suits for the fast updating sparse big data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DisCANTree: A Distributed Algorithm for Incremental Frequent Itemset Mining based on MapReduce

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Journal: Journal of Physics: Conference Series	Publication Date: Nov 1, 2020
License type: cc-by

Similar Papers

A new algorithm for fast mining frequent itemsets using N-lists
Zhihong Deng ... Zhonghui Wang
Science China Information Sciences | VOL. 55
Zhihong Deng, et. al.Zhihong Deng ... Zhonghui Wang
19 Jul 2012
Science China Information Sciences | VOL. 55

Stable Periodic Frequent Itemset Mining on Uncertain Datasets
Ruimeng He ... Jinchao Chen
-
Ruimeng He, et. al.Ruimeng He ... Jinchao Chen
13 Aug 2021
13 Aug 2021

A Novel Method to Generate Frequent Itemsets in Distributed Environment
Jingyi Zheng ... Xiaoheng Deng
-
Jingyi Zheng, et. al.Jingyi Zheng ... Xiaoheng Deng
01 Nov 2018
01 Nov 2018

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream
Guangtao Wang ... Ying Zhang
ACM Transactions on Knowledge Discovery from Data | VOL. 16
Guangtao Wang, et. al.Guangtao Wang ... Ying Zhang
21 Jul 2021
ACM Transactions on Knowledge Discovery from Data | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DisCANTree: A Distributed Algorithm for Incremental Frequent Itemset Mining based on MapReduce

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series