A fast and parallel algorithm for frequent pattern mining from big data in many-task environments

Wei Tee Lin,Chih Ping Chu

doi:10.1504/ijhpcn.2017.084244

Abstract

Many studies have tried to efficiently discover frequent patterns in large databases. The algorithms used in these studies fall into two main categories: apriori algorithms and frequent pattern growth (FP-growth) algorithms. Apriori algorithms operate according to a generate-and-test approach, so performance suffers from the testing of too many candidate itemsets. Therefore, most recent studies have applied an FP-growth approach to the discovery of frequent patterns. The rapid growth of data, however, has introduced new challenges for the mining of frequent patterns, in terms of both execution efficiency and scalability. Big data often contains a large number of items, a large number of transactions and long average transaction length, which result in large FP-trees. In addition to its dependence on data characteristics, FP-tree size is also sensitive to the minimum support threshold. This is because the small support is probable to bring many branches for nodes, greatly enlarging the FP-tree and the number of reconstructed conditional pattern-based trees. In this paper, we propose a novel algorithm and architecture for efficiently mining frequent patterns from big data in distributed many-task computing environments. Through empirical evaluation of various simulation conditions, we show that the proposed method delivers excellent execution time.

Full Text