Frequent Itemset Mining for Big Data Using Greatest Common Divisor Technique

Mohamed A Gawwad,Magda B Fayek,Mona F Ahmed

doi:10.5334/dsj-2017-025

Abstract

The discovery of frequent itemsets is one of the very important topics in data mining. Frequent itemset discovery techniques help in generating qualitative knowledge which gives business insight and helps the decision makers. In the Big Data era the need for a customizable algorithm to work with big data sets in a reasonable time becomes a necessity. In this paper we propose a new algorithm for frequent itemset discovery that could work in distributed manner with big datasets. Our approach is based on the original Buddy Prima algorithm and the Greatest Common Divisor (GCD) calculation between itemsets which exist in the transaction database. The proposed algorithm introduces a new method to parallelize the frequent itemset mining without the need to generate candidate itemsets and also it avoids any communication overhead between the participated nodes. It explores the parallelism abilities in the hardware in case of single node operation. The proposed approach could be implemented using map-reduce technique or Spark. It was successfully applied on different size transactions DBs and compared with two well-known algorithms: FP-Growth and Parallel Apriori with different support levels. The experiments showed that the proposed algorithm achieves major time improvement over both algorithms especially with datasets having huge number of items.

Highlights

Frequent itemsets discovery “is one of the most important techniques in data mining” (Zhengui Li, 2012)
In this paper we propose a parallelizable algorithm for Frequent itemset mining (FIM) that could deal with big data sets exploiting the multicore feature of the hardware
We used Retail dataset to show this capability for the proposed algorithm POBPA

Summary

Introduction

Frequent itemsets discovery “is one of the most important techniques in data mining” (Zhengui Li, 2012). It can find out the association relationships among events or data objects that are hidden in the data, even if the associated events or objects seems not related at all. Literature contains many approaches that tackle the FIM problem like Apriori, FP-Growth, multi-level frequent itemsets, DHP (Direct Hashing and Pruning), maximal association rule mining, primitive association rules, softmatching rules and Buddy Prima. An association rules cheese, chips (80%) states that four out of five customers that bought cheese bought chips Such rules can be useful for decisions concerning products pricing, promotions, store layout and many others.

Literature Review

Prime Numbers Representation Algorithms

Data Preparation

Frequent Itemsets Deduction using GCD

Experimental Results

Conclusion and future work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Science Journal	Publication Date: May 18, 2017
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Frequent Itemset Mining for Big Data Using Greatest Common Divisor Technique

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science Journal

Lead the way for us

Similar Papers

An Enhanced Approach to Mine Maximal Frequent Itemset using Maximal Frequent Itemset Prima Algorithm (MFIPA)
R Smeeta Mary ... K Perumal
Asian Journal of Computer Science and Technology | VOL. 8
R Smeeta Mary, et. al.R Smeeta Mary ... K Perumal
05 Mar 2019
Asian Journal of Computer Science and Technology | VOL. 8

P-BBA: A Master/Slave Parallel Binary-based Algorithm for Mining Frequent Itemsets in Big Data
Aliya Najiha Amir ... Rohiza Ahmad
-
Aliya Najiha Amir, et. al.Aliya Najiha Amir ... Rohiza Ahmad
08 Oct 2020
08 Oct 2020

SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
Wen Xiao ... Juan Hu
The Journal of Supercomputing | VOL. 76
Wen Xiao, et. al.Wen Xiao ... Juan Hu
04 Feb 2020
The Journal of Supercomputing | VOL. 76

Experimental Implementation of Quantum Algorithm for Association Rules Mining
Chao-Hua Yu
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | VOL. 12
Chao-Hua YuChao-Hua Yu
01 Sep 2022
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Frequent Itemset Mining for Big Data Using Greatest Common Divisor Technique

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science Journal