Efficient Mining of Association Rules based on Clustering from Distributed Data

Marwa Bouraoui,Amel Grissa

doi:10.14569/ijacsa.2019.0100449

Abstract

Data analysis techniques need to be improved to allow the processing of data. One of the most commonly used techniques is the Association Rule Mining. These rules are used to detect facts that often occur together within a dataset. Unfortunately, existing methods generate a large number of association rules, without accentuation on the relevance and utility of these rules, and hence, complicating the results interpretation task. In this paper, we propose a new approach for mining association rules with an emphasis on easiness of assimilation and exploitation of the carried knowledge. Our approach addresses these shortcomings, while efficiently and intelligently minimizing the rules size. In fact, we propose to optimize the size of the extraction contexts taking advantages of the Clustering techniques. We then extract frequent itemsets and rules in the form of Meta-itemsets and Meta-rules, respectively. Experiments on benchmarking datasets show that our approach leads to a significant reduction of the number of generated rules thereby speeding up the execution time.

Highlights

Association rules mining has become one of the core data mining tasks with many real world applications such as selective marketing, fraud detection in web, economic census, and several other applications
The main idea is to mine distributed frequent itemsets from a representative set consisting of a collection of classes, called Meta-Itemsets, we mine rules in the form of Meta Association Rules
We present in the following the general principle of our approach: Input: N sites, D [n] (n = 1..N) a set of distributed data through N sites, s local minSupp, S global minSupp, C number of clusters, α accuracy Step 1: Iterative Pre-processing Phase based on Clustering For each site (i = 1, i ≤ N, i ++) Apply a fuzzy clustering algorithm to organize the data into different groups

Summary

INTRODUCTION

Association rules mining has become one of the core data mining tasks with many real world applications such as selective marketing, fraud detection in web, economic census, and several other applications. A likewise output adds only inconvenience to data exploitation task from mined rules rely heavily on human interpretation in order to infer their semantic meanings To overcome these shortcomings, the solution we consider is to combine clustering and association rules mining technologies, to efficiently mine rules from large distributed data. We propose a new approach, Clustering based Distributed Association Rules Mining Algorithm (C-DARM), which continues to extract rules from business data, but avoid rendering irrelevant and extensive number of results. Our aim is refining the output for a better understanding, and an uncomplicated interpretation of the carried knowledge To do this task efficiently, we propose to introduce a pre-processing step based on clustering to optimize the size of the remote extraction contexts. We conclude our paper and present our ideas for future work

LITERATURE REVIEW

MOTIVATION

NEW APPROACH

Process of Mining Association Rules

VALIDATION AND EXPERIMENTAL RESULTS

CONCLUSION