Exploiting GPU and cluster parallelism in single scan frequent itemset mining

Youcef Djenouri,Djamel Djenouri,Asma Belhadi,Alberto Cano

doi:10.1016/j.ins.2018.07.020

Youcef Djenouri, Djamel Djenouri + Show 2 more

https://doi.org/10.1016/j.ins.2018.07.020

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This paper considers discovering frequent itemsets in transactional databases and addresses the time complexity problem by using high performance computing (HPC). Three HPC versions of the Single Scan (SS) algorithm are proposed. The first one (GSS) implements SS on a GPU (Graphics Processing Unit) architecture using an efficient mapping between thread blocks and the input data. The second approach (CSS) implements SS on a cluster architecture by scheduling independent jobs to workers in a cluster. The third, (CGSS) accelerates the frequent itemset mining process by using multiple cluster nodes equipped with GPUs. Moreover, three partitioning strategies are proposed to reduce GPU thread divergence and cluster load imbalance. Results show that CGSS outperforms SS, GSS, and CSS in terms of speedup. Specifically, CGSS provides up to a 350 times speedup for low minimum support values on large datasets. GCSS demonstrably outperforms the state-of-the-art HPC-based algorithms on big databases.

Full Text