Distributed Association Rule Mining

Mafruz Zaman Ashrafi

doi:10.4018/978-1-60566-010-3.ch108

Abstract

Data mining is an iterative and interactive process that explores and analyzes voluminous digital data to discover valid, novel, and meaningful patterns (Mohammed, 1999). Since digital data may have terabytes of records, data mining techniques aim to find patterns using computationally efficient techniques. It is related to a subarea of statistics called exploratory data analysis. During the past decade, data mining techniques have been used in various business, government, and scientific applications. Association rule mining (Agrawal, Imielinsky & Sawmi, 1993) is one of the most studied fields in the data-mining domain. The key strength of association mining is completeness. It has the ability to discover all associations within a given dataset. Two important constraints of association rule mining are support and confidence (Agrawal & Srikant, 1994). These constraints are used to measure the interestingness of a rule. The motivation of association rule mining comes from market-basket analysis that aims to discover customer purchase behavior. However, its applications are not limited only to market-basket analysis; rather, they are used in other applications, such as network intrusion detection, credit card fraud detection, and so forth. The widespread use of computers and the advances in network technologies have enabled modern organizations to distribute their computing resources among different sites. Various business applications used by such organizations normally store their day-to-day data in each respective site. Data of such organizations increases in size everyday. Discovering useful patterns from such organizations using a centralized data mining approach is not always feasible, because merging datasets from different sites into a centralized site incurs large network communication costs (Ashrafi, David & Kate, 2004). Furthermore, data from these organizations are not only distributed over various locations, but are also fragmented vertically. Therefore, it becomes more difficult, if not impossible, to combine them in a central location. Therefore, Distributed Association Rule Mining (DARM) emerges as an active subarea of data-mining research. Consider the following example. A supermarket may have several data centers spread over various regions across the country. Each of these centers may have gigabytes of data. In order to find customer purchase behavior from these datasets, one can employ an association rule mining algorithm in one of the regional data centers. However, employing a mining algorithm to a particular data center will not allow us to obtain all the potential patterns, because customer purchase patterns of one region will vary from the others. So, in order to achieve all potential patterns, we rely on some kind of distributed association rule mining algorithm, which can incorporate all data centers. Distributed systems, by nature, require communication. Since distributed association rule mining algorithms generate rules from different datasets spread over various geographical sites, they consequently require external communications in every step of the process (Ashrafi, David & Kate, 2004; Assaf & Ron, 2002; Cheung, Ng, Fu & Fu, 1996). As a result, DARM algorithms aim to reduce communication costs in such a way that the total cost of generating global association rules must be less than the cost of combining datasets of all participating sites into a centralized site.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributed Association Rule Mining

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Research on distributed data mining system and algorithm based on multi-agent /
Lingxia Jiang
-
Lingxia JiangLingxia Jiang
01 Jan 2009
01 Jan 2009

Distributed Cluster Based Association Rule Mining Approach

Asian Journal of Computer Science And Information Technology | VOL. 4

01 Jul 2014
Asian Journal of Computer Science And Information Technology | VOL. 4

Adaptive-Miner: an efficient distributed association rule mining algorithm on Spark
Sanjay Rathee ... Arti Kashyap
Journal of Big Data | VOL. 5
Sanjay Rathee, et. al.Sanjay Rathee ... Arti Kashyap
20 Feb 2018
Journal of Big Data | VOL. 5

Distributed higher order association rule mining using information extracted from textual data
Shenzhi Li ... William M Pottenger
ACM SIGKDD Explorations Newsletter | VOL. 7
Shenzhi Li, et. al.Shenzhi Li ... William M Pottenger
01 Jun 2005
ACM SIGKDD Explorations Newsletter | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed Association Rule Mining

Abstract

Talk to us

Similar Papers