Abstract

Association rule (AR) mining represents a challenge in the field of data mining. Mining ARs using traditional algorithms generates a large number of candidate rules, and even if we use binding measures such as support, reliability, and lift, there are still several rules to keep, and domain experts are needed to extract the rules of interest from the remaining rules. The focus of this paper is on whether we can directly provide rule rankings and calculate the proportional relationship between the items in the rules. To address these two questions, this paper proposes a modified FP-Growth algorithm called FP-GCID (novel FP-Growth algorithm based on Cluster IDs) to generate ARs; in addition, a new method called Mean-Product of Probabilities (MPP) is proposed to rank rules and compute the proportion of items for one rule. The experiment is divided into three phases: the DBSCAN (Density-Based Scanning Algorithm with Noise) algorithm is used to cluster the geographic interest points and map the obtained clusters into corresponding transaction data; FP-GCID is used to generate ARs, which contain cluster information; and MPP is used to choose the best rule based on the rankings. Finally, a visualization of the rules is used to validate whether the two previously stated requirements were fulfilled.

Highlights

  • In the last two decades, association rule (AR) mining has become one of the most important tasks in the field of knowledge discovery

  • The main work of this paper is to propose a method of mining ARs for geographical points of interest in order to find the relationship between geographic points of interest, including quantitative relationships

  • ARs are generated in the second phase by the FP-GCID algorithm, which is an improved version of the FP-Growth algorithm

Read more

Summary

Introduction

In the last two decades, association rule (AR) mining has become one of the most important tasks in the field of knowledge discovery. The greatest advantage of this method is that it compresses all transactions of the database into a frequent pattern tree, which contains information associated with the itemsets. Mining algorithms produce a large number of ARs, but not all of them are useful to you, which requires us to discover the rules of interest. DBSCAN has two input parameters, namely, —the radius of the neighborhood and μ—the density threshold, which is the minimum number of points required in the neighborhood of a core object These two parameters assist users in finding acceptable clusters. The main work of this paper is to propose a method of mining ARs for geographical points of interest in order to find the relationship between geographic points of interest, including quantitative relationships

Analysis of FP-Growth Approaches
Analysis of Approaches to Interestingness Measures
Analysis of Mining Association Rules with Clustering
Methodology
FP-GCID
Experiment and Analysis
Spatial Clustering with DBSCAN
Conversion to Transactional Data
Generating Frequent Itemsets
Association Rule Filtering
Finding Interesting Rules with MPP
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call