A Scalable Approach for Improving Implementation of a Frequent Pattern Mining Algorithm using MapReduce Programming

Md Abed Hasan,Naima Hassan,Md Hasibuzzaman,Mohammad Rezwanul Huq

doi:10.1109/icsitech46713.2019.8987446

Abstract

A Frequent pattern is a pattern (a set of items, subsequences, sub-graphs, etc.) that occurs frequently in a transactional database. Frequent pattern mining gives vast benefit in domains such as extracting knowledge from transactional data for market basket analysis or cross-marketing and selling. A number of important FIM (Frequent itemset mining) algorithms have been developed to speed up mining performance since its inception. Unfortunately, when the dataset size is massive, it can still be prohibitively expensive for communication cost, memory usage, balanced data distribution & I/O utilization. One of the existing frequent pattern mining algorithms called CATS Tree (Compressed and Arranged Sequences tree) can perform interactive mining by a single scan. In this work, we propose to parallelize a part of CATS-Tree algorithm on scattered machines, which will improve the overall performance of CATS-Tree for large transaction data. This algorithm partitions computation to execute an independent group of mining tasks on each machine. We present a comparison based on time complexity, algorithm complexity and performance on a different type of datasets. The result shows that the proposed parallel implementation of CATS-Tree provides better performance for massive datasets.

Full Text