Hadoop-HBase for finding association rules using Apriori MapReduce algorithm

Ashwini A Pandagale,Anil R Surve

doi:10.1109/rteict.2016.7807935

Abstract

Pattern discovery is the important part of knowledge discovery in Database, comes under Data mining. To discover useful patterns, association rule mining is one of the most popularized and revealing technique in data mining. Association rule mining plays a key role in decision making by discovering useful relations between attributes in the database. For this, first Frequent itemsets need to calculate followed by Candidate itemset. While generating frequent itemsets, frequent-1 itemset can be generated easily. But frequent 2-itemsets suffered from both time and space complexity. More overhead and space complexity occurred in a generation of frequent 2-itemset is the issue of this paper. For more I/O throughput it is essential to generate frequent itemsets as fast as possible and space eificient. To possible this, intermediate data generated by pairing each item with another item in itemset needs access of random read/write. To access random data for low latency, Apache HBase is the solution. Based on performed results, it is shown that if dataset stored through HBase on HDFS, space and time complexity can be achieved better with Apriori MapReduce algorithm for finding association rules.

Full Text