Abstract

This paper considers discovering frequent itemsets in transactional databases and addresses the time complexity problem by using high performance computing (HPC). Three HPC versions of the Single Scan (SS) algorithm are proposed. The first one (GSS) implements SS on a GPU (Graphics Processing Unit) architecture using an efficient mapping between thread blocks and the input data. The second approach (CSS) implements SS on a cluster architecture by scheduling independent jobs to workers in a cluster. The third, (CGSS) accelerates the frequent itemset mining process by using multiple cluster nodes equipped with GPUs. Moreover, three partitioning strategies are proposed to reduce GPU thread divergence and cluster load imbalance. Results show that CGSS outperforms SS, GSS, and CSS in terms of speedup. Specifically, CGSS provides up to a 350 times speedup for low minimum support values on large datasets. GCSS demonstrably outperforms the state-of-the-art HPC-based algorithms on big databases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.