A Parallel Apriori-Transaction Reduction Algorithm Using Hadoop-Mapreduce in Cloud

A L Sayeth Saabith,Azuraliza Abu Bakar,Elankovan Sundararajan

doi:10.9734/ajrcos/2018/v1i124719

Abstract

    Apriori algorithm is a classical algorithm of association rule mining and widely used for generating frequent item sets. However, the original Apriori algorithm has some limitation such as it needs to scan the dataset many times to discover all frequent itemset and generate huge number of candidate itemset. To overcome these limitations, researchers have made a lot of improvements to the Apriori such as candidate generation, without candidate generation, transaction reduction, partitioning, and sampling. When it comes to mine massive data, these algorithms failed to prove efficiency because limitation of the processing capacity, storage capacity, and main memory constraints. Therefore, parallel and distributed algorithms are developed to perform large-scale computing in ARM on multiple processors. However, the problems with most of the parallel and distributed framework are overheads of managing distributed system, lack of high level parallel programming language, and node failures. Hadoop-MapReduce is an efficient, scalable, and simplified programming model for massive data processing and it also available on cloud environment. Cloud computing offers huge computing resources, and capacities to solve big data challenges. Recently many parallel algorithms have been proposed on Hadoop-MapReduce to enhance the performance of Apriori algorithm but there are some drawbacks: since multiple scan over the dataset is needed to generate candidate itemset, it consume more execution time. The aim of this study is to propose a parallel Transaction Reduction MapReduce Apriori algorithm (TRMR-Apriori) which is reduce unnecessary transaction values and transactions from the dataset in parallel manner to overcome above problems. The experiments show that TRMR-Apriori is able to achieve better execution time to discover frequent itemset those of previous sequential ARM algorithms such as Apriori, AprioriTid, Eclat, and FP-Growth and the previous parallel algorithms such as PApriori, MRApriori, and Modified Apriori with different condition on homogeneous computing environment using Hadoop-MapReduce platform in cloud. Overall, the TRMR-Apriori shows the strength to extract the frequent itemset from massive dataset in cloud.    

Full Text