Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers

Chun-Hong Huang,Yungho Leu

doi:10.1007/s10586-017-1609-6

Abstract

Frequent Itemset mining is time consuming for large datasets. Many parallel frequent itemset mining algorithms have been proposed to speed up the mining process. This paper presents a parallel frequent itemset mining algorithm on a cluster of personal computers. To facilitate parallel frequent itemset mining, we use prefix path based method to decompose a transactional dataset into its frequent 1-itemset sub-datasets. We called the parallel frequent itemset mining algorithm based on the frequent 1-itemset sub-dataset decomposition the single-level parallel frequent itemset mining algorithm (SLPFIM) in our PC cluster platform. To mitigate the bottleneck caused by time-consuming 1-itemset sub-datasets, we propose a multi-level parallel frequent itemset mining (MLPFIM) algorithm to further decompose the time-consuming 1-itemset sub-datasets into their corresponding sub-sub-datasets. The fine granule of the sub-sub-datasets enhances the load balancing in parallel frequent itemset mining. The experimental results showed that the SLPFIM offered a maximum of 11.9x speedup over the non-parallel execution of the FP-Growth algorithm while the MLPFIM achieved a maximum of 23.1x speedup over the non-parallel execution of the FP-Growth algorithm. The experimental results also showed that the MLPFIM offered a maximum of 2.14x speedup over the SLPFIM.

Full Text