Abstract

Mining frequent item set (FI) is an important issue in data mining. Considering the limitations of those exact algorithms and sampling methods, a novel FI mining algorithm based on granular computing and fuzzy set theory (FI-GF) is proposed, which mines those datasets with high number of transactions more efficiently. Firstly, the granularity is applied, which compresses the transactions to some granules for reducing the scanning cost. During the granularity, each granule is represented by a fuzzy set, and the transaction scale represented by a granule is optimized. Then, fuzzy set theory is used to compute the supports of item sets based on those granules, which faces the uncertainty brought by the granularity and ensures the accuracy of the final results. Finally, Apriori is applied to get the FIs based on those granules and the new computing way of supports. Through five datasets, FI-GF is compared with the original Apriori to prove its reliability and efficiency and is compared with a representative progressive sampling way, RC-SS, to prove the advantage of the granularity to the sampling method. Results show that FI-GF not only successfully saves the time cost by scanning transactions but also has the high reliability. Meanwhile, the granularity has advantages to those progressive sampling methods.

Highlights

  • Frequent item sets (FIs) contain the items which always appear together in a dataset with the frequency over a specified minimum support [1, 2]

  • Given a dataset which contains the records of a store, if products A and B are found always bought together by customers, and the frequency of this phenomenon is over the minimum support, {A, B} can be seen as a FI

  • quantitative FI (QFI) is mined from the quantitative dataset, where every item has a scale of value

Read more

Summary

Introduction

Frequent item sets (FIs) contain the items which always appear together in a dataset with the frequency over a specified minimum support [1, 2]. If the transaction scale of a dataset is large, the mining speed becomes low Another algorithm, FP-growth, is proposed to solve this problem [4]. There is a feasible idea which can be used to solve the problem brought by the scanning, and the main thought of it is to sacrifice some accuracy of the results and to earn the faster speed of algorithm [6] According to this idea, some sampling methods are presented [7,8,9,10,11], in which the original dataset is firstly sampled randomly, and the algorithm mines the results just from those samples but the whole dataset, which can extremely cut the cost. (2) Fuzzy set is used to denote granules, and a method to calculate the supports of item sets based on those granules is designed, which helps the algorithm to deal with the uncertainty brought by transaction reduction well

Basic Concepts
The Proposed Algorithm FI-GF
The Experiments and Discussions
Results of Apriori
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call