Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster

P.K Mishra,Rakhi Garg,Sudhakar Singh

doi:10.5120/ijca2015906632

Abstract

Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Applications

Lead the way for us

Journal: International Journal of Computer Applications	Publication Date: Oct 15, 2015
Citations: 29

Similar Papers

A data structure perspective to the RDD-based Apriori algorithm on Spark
Pankaj Singh ... Sudhakar Singh
International Journal of Information Technology | VOL. 14
Pankaj Singh, et. al.Pankaj Singh ... Sudhakar Singh
07 Aug 2019
International Journal of Information Technology | VOL. 14

A Data Structure Perspective to the RDD-based Apriori Algorithm on Spark
Pankaj Singh ... P.K Mishra
SSRN Electronic Journal | VOL. -
Pankaj Singh, et. al.Pankaj Singh ... P.K Mishra
01 Jan 2019
SSRN Electronic Journal | VOL. -

A new algorithm for fast mining frequent itemsets using N-lists
Zhihong Deng ... Zhonghui Wang
Science China Information Sciences | VOL. 55
Zhihong Deng, et. al.Zhihong Deng ... Zhonghui Wang
19 Jul 2012
Science China Information Sciences | VOL. 55

An Adaptive Method for Mining Frequent Itemsets Based on Apriori And FP Growth Algorithm
Md Mahamud Hasan ... Sadia Zaman Mishu
-
Md Mahamud Hasan, et. al.Md Mahamud Hasan ... Sadia Zaman Mishu
01 Feb 2018
01 Feb 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Applications